The acoustic and seismic signal features of AAV3, shown in
Figure 3, indicate that differences between parts of the WCER feature variables are evident and can reflect the target characteristic, whereas the differences between the remaining feature variables are insignificant. Simplifying the feature vector will remove the redundant variables in the feature vector, which can decrease the computational complexity of the subsequent target classification and effectively reduce the time consumption of the process of target classification. The hierarchical cluster method [
21] and principal component analysis (PCA) method [
22] are two effective approaches to reduce the dimensions of the feature matrix.
3.1. Hierarchical Cluster Method
The hierarchical cluster method can aggregate variables into several clusters based on their similarity. After determining representative variables from aggregated clusters to reduce the number of variables, the dimension of the feature vector will be reduced. The feature vector of the signal obtained using the WCER method is f = [W1, W2, …, Wm], m = 2J, and J is the desired wavelet decomposition depth. The observation matrix of the feature vector of the target signal is denoted as F, whose column vector is the data sequence of the variable in the feature vector, f. By analyzing the sample observation matrix, the main variables in the feature vector are selected to realize its dimensionality reduction.
The hierarchical cluster method utilizes the distance between variables in the feature vector,
f, and between clusters in the process of clustering to quantify the similarities between variables and the clusters. First, a pair of the nearest feature variables is selected to be merged into a new class according to the calculated feature variable distance matrix,
D, expressed in Equation (7), then the two closest clusters are merged into a class. This process is repeated until a preset number of clusters is achieved:
where
dist (
Wu,
Wv) represents the distance between the variable,
Wu, and the variable,
Wv.
Wu is the
uth variable in the feature vector and
Wv is the
vth variable.
Figure 4 shows the distance between variables in the feature vector and between clusters in the clustering process.
d (r,
s) represents the distance between the cluster,
r, and the cluster,
s.
The distance between the variables,
Wu and
Wv, is calculated using their correlation coefficient and the calculation method is expressed in Equation (8).
where
R (
Wu,
Wv) represents the correlation coefficient of
Wu and
Wv. C(
Wu,
Wv) is the covariance of
Wu and
Wv, and it is calculated as follows:
After obtaining the distance matrix, we must also calculate the distance between the clusters in the clustering process. In this study, the unweighted pair grouping method with arithmetic mean (UPGMA) [
23] was utilized to calculate the distance between clusters. UPGMA is a widely used bottom-up hierarchical clustering method that defines cluster similarity [
24] in terms of the average pairwise distance between all the objects in two different clusters. If the two clusters in the clustering process are the cluster,
r, and the cluster,
s, the formula for calculating the distance between them utilizing the UPGMA is expressed in Equation (10).
where
nr is the number of objects in cluster
r and
ns is the number of objects in cluster
s.
Wrp is the
pth subject in cluster
r and
Wsq is the
qth subject in cluster
s.
3.2. PCA Method
The PCA method is another efficient approach to reduce the dimension of the feature vector. This approach reduces the dimensions of the feature by extracting several factors that contribute the most to signal differences in the observation matrix of the target feature vector.
The feature vector of the signal is
f = [
W1,
W2, …,
Wm] and the column vector, [
wi1,
wi2, …,
wim], in the feature matrix,
F, is the data sequence of the variable,
Wi, in the feature vector,
f. The variable should be normalized according to Equation (11) before performing principal component analysis.
where
μi is the mean and
Si is the standard deviation of the sequence of the variable,
Wi. After calculating the correlation coefficient of the variables in the feature vector, the correlation coefficient matrix can be achieved.
Then, all the eigenvalues,
λi (
i = 1, 2, …,
m), of the correlation coefficient matrix,
C, and the corresponding eigenvectors, [
ui1,
ui2, …,
uim], are calculated. Then the eigenvectors consists of
m new indicator variables shown as follows:
where
yi is the
ith principal component. The contribution rate of the principal component, which means how much the new indicator variable contributes to the difference of the target features, is calculated from its corresponding eigenvector:
When the sum of the former P contribution rates of the of new indicator variables is approximately 1 (usually 0.9 or 0.95), the former P indicator variables, [y1, y2, …, yP], are selected as the new features vector instead of the original m features.
3.4. Feature Vector Simplification Using Hierarchical Clustering
The acoustic and seismic signal features of AAV3 were considered as examples for simplifying the variables in the feature vectors using the hierarchical clustering method.
Figure 5 shows the clustering results of the feature vectors of AAV3. Eight-level wavelet decomposition of the acoustic and seismic signals of AAV3 yields two feature vectors with 16 variables. The acoustic feature vector is denoted as
fa and the seismic feature vector is denoted as
fs. After clustering variables with a distance less than 1, the variables with large differences in the feature vectors are retained and the variables with small differences are aggregated into a cluster.
The 16 variables in the acoustic feature vector, (), were reduced to 9 after feature vector simplification. The variables, W1, W2, W3, and W4, were aggregated into a cluster and replaced with the variable, W4. The variable, W12, was utilized instead of the variables, W9, W10, W11, and W12, after they were aggregated into a cluster. Further, the variables, W15 and W16, were aggregated into a cluster and the variable, W15, was subsequently utilized instead.
The 16 variables in the acoustic feature vector, (), were reduced to 7. The variables, W1, W2, W3, and W4, were aggregated into a cluster and replaced with the variable, W4. The variable, W14, was utilized instead of the variables, W9, W10, W11, W12, W13, and W14, after they were aggregated into a cluster. Further, the variables, W15 and W16, were aggregated into a cluster and the variable, W15, was subsequently utilized instead.
The simplified acoustic and seismic feature vectors are denoted as fanew and fsnew, where fanew = [W4, W5, W6, W7, W8, W12, W13, W14, W15], which corresponds to [WDcer(4), …, WDcer(8); WAcer(4), …, WAcer(7)]. fsnew = [W4, W5, W6, W7, W8, W14, W15], which corresponds to [WDcer(4), …, WDcer(8); WAcer(6), WAcer(7)].
The results of using simplified features for target classification will be discussed in
Section 5.1.