1. Introduction
Automatic target recognition (ATR) from synthetic aperture radar (SAR) images has important meanings for its pervasive applications in both the military and civil fields [
1]. All these years, researchers have tried to find or design proper feature extraction methods and classification schemes for SAR ATR. Principal component analysis (PCA) and linear discriminant analysis (LDA) [
2] are usually used for feature extraction from SAR images. Other features, such as geometrical descriptors [
3], attributed scattering centers [
4,
5], and monogenic spectrums [
6], are also applied to SAR target recognition. As for the decision engines, various classifiers, including support vector machines (SVM) [
7], sparse representation-based classification (SRC) [
8,
9], and convolutional neural networks (CNN) [
10] are employed for target recognition and have achieved delectable results. Despite great effort, SAR target recognition under extended operating conditions (EOCs) [
1] is still a difficult problem.
Most of the previous SAR ATR methods make use of single view SAR image. The single view image is compared with the training samples to find its nearest neighbor in the manifolds spanned by individual training classes. Due to the unique mechanism of SAR imaging, an SAR image is very sensitive to the view angle [
11]. Hence, the usage of multi-view SAR images may help in the interpretation of SAR images. Actually, multi-view SAR exploitation has been successfully applied to SAR image registration [
12]. As for SAR target recognition, it is predictable that the classification performance may vary with the view angle and the usage of multi-view images will probably improve the effectiveness and robustness of SAR ATR systems. As in the case of remote sensing data fusion [
13], the benefits of using multi-view SAR images can be analyzed from two aspects. On the one hand, the images from different views can provide complementary descriptions for the target. On the other hand, the inner correlation among different views is also discriminative to target recognition. Therefore, the exploitation of multiple incoherent views of a target should provide a more robust classification performance than single view classification. Several ATR algorithms based on multi-view SAR images have been proposed, which can be generally divided into three categories. The first category uses parallel decision fusion for multi-view SAR images based on the assumption that the multiple views are independent. Brendel et al. [
14] analyze the fundamental benefits of aspect diversity for SAR ATR based on the experimental results of a minimum-square-error (MSE) classifier. A Bayesian multi-view classifier [
15] is proposed by Brown for target classification. The results demonstrate that use of multiple SAR images can significantly improve the ATR algorithm’s performance even with only two or three SAR views. Bhanu et al. [
16] employ scatterer locations as the basis to describe the azimuthal variance with application to SAR target recognition. The experimental results demonstrated that the correlation of SAR images can be only maintained in a small azimuth interval. Model-based approaches [
17] are developed by Ettinger and Snyder by fusing multiple images of a certain target at different view angles, namely decision-level and hypothesis-level fusion. Vespe et al. [
18] propose a multi-perspective target classification method, which uses function neural networks to combine multiple views of a target collected at different locations. Huan et al. [
19] propose a parallel decision fusion strategy for SAR target recognition using multi-aspect SAR images based on SVM. The second category uses a data fusion strategy. Methods of this category first fuse the multi-view images to generate a new input, which is assumed to contain the discriminability of individual views as well as the inner correlation of these views. Then, the fused data is employed for target recognition. Two data fusion methods for multi-view SAR images are proposed by Huan et al., i.e., the PCA method and the discrete wavelet transform (DWT) fusion method [
19]. The third category uses joint decision fusion. Rather than classifying multi-view images respectively and then fusing the outputs, the joint decision strategy puts all the views in a unified decision framework by exploiting their inner correlation. Zhang et al. [
20] apply joint sparse representation (JSR) to multi-view SAR ATR, which can exploit the inner correlations among different views. The superiority of multiple views over a single view is quantitatively demonstrated in their experiments. In comparison, the first category neglects the inner correlation among multi-view images. Although the data fusion methods consider both individuality and inner correlation, it is hard to evaluate the discriminability loss during the fusion process. The third category considers the inner correlation in the decision process, but the strategy may not work well when the multi-view images are not closely related.
In the above literatures, multi-view recognition has been demonstrated to be much more effective than single view recognition. However, some practical restrictions on multi-view recognition are neglected. Due to the EOCs in real scenarios, some of the collected views may be severely contaminated. Then, it is unwise to use these views during multi-view recognition. In this paper, a novel structure for multi-view recognition is proposed. The multiple views of a target are first classified by SRC. Based on the reconstruction residuals, a principle is designed to judge whether a certain view is discriminative enough for multi-view recognition. Then, JSR is employed to classify the selected views. However, when the input views are not closely related, the joint representation over the global dictionary is not optimal due to an incorrect correlation constraint. As a remedy, in this paper, the selected atoms of each of the input views by SRC are combined to construct an enhanced local dictionary. The selected atoms and their neighboring ones (those sharing approaching azimuths) are used to construct the enhanced local dictionary, which inherits the representation capability of the global dictionary for the selected views. Meanwhile, it constrains the atoms for representation, thus avoiding the incorrect atom selections occurring in the global dictionary, especially when the multiple views are not closely related. By performing JSR on the enhanced local dictionary, both the correlated views and the unrelated views can be represented properly. Finally, the target label will be decided according to the residuals of JSR based on the minimum residual principle.
In the following, we first introduce the practical restrictions on multi-view recognition in
Section 2. Then, in
Section 3, the proposed structure for multi-view recognition is presented. Extensive experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset in
Section 4, and finally, conclusions are drawn in
Section 5.