1. Introduction
The surveying and mapping of underwater topography for coastal zones, islands, and reefs is one of the most crucial research fields in oceanography and provides key geographic information data for nearshore navigation [
1,
2,
3], ocean geomorphology [
4,
5], coral reef studies [
6,
7], and hydrography [
8,
9]. Currently, underwater topography in coastal zones and island areas is primarily dependent on airborne light detection and ranging (LiDAR) and spaceborne photogrammetry, and the efficiency of shipborne acoustic systems in these areas is extremely low [
10,
11]. However, the detection area of this method is usually limited by the airborne platform [
12]. Another mainstream method for underwater topography is spaceborne photogrammetry based on high-resolution multispectral stereo images, which can efficiently obtain expansive underwater topography of coastal, island, and reef areas [
13]. In the process of spaceborne photogrammetry, remote sensing image matching is a critical and necessary procedure that directly impacts the correctness and accuracy of underwater topography [
14]. Furthermore, image matching is identified as a crucial step, with the quality of the match exerting a direct influence on the performance of applications, such as change detection, object detection and tracking, and image stitching. Compared with conventional image matching for land areas [
15,
16,
17], the remote sensing image matching of coastal zones and islands includes some areas covered by the water column with different depths, which decreases the image texture information and significantly increases the matching difficulty [
18,
19].
The process of remote sensing image matching can be divided into two steps: (1) extraction and matching of the point pairs from the stereo image pairs, and (2) optimization of the matching points [
20]. Methods for the extraction and matching of the point pairs can be generally classified into two categories: area-based methods and feature-based methods [
21]. Area-based methods, also referred to as correlation-like or template matching, establish correspondence between reference and sensed pixels using a similarity measure. Feature-based methods aim to identify distinctive features in images and then match these features to establish correspondence between reference and sensed images. However, area-based matching methods are only suitable for homologous images with linear radiometric differences and face challenges when used to extract common features between image pairs. Conversely, feature-based matching is more suitable for situations involving significant scales of offset, geometric distortion or scale differences between two images [
22]. Typical solutions rely on feature detectors, descriptors, and matchers to generate putative correspondences. At present, feature-matching methods, such as invariant feature transform (SIFT) [
23], speeded-up robust features (SURF) [
24], accelerated KAZE (AKAZE) [
25], and oriented brief (ORB) [
26], are generally performed based on “feature points” detection and description by their local intensity or shape patterns.
After the process of extraction and matching of the point pairs, extensive matching points are extracted, including true and false matches. Therefore, the initial result of the matching points should be optimized to remove false matches. A large portion of true matches are eliminated to limit false matches [
27]. Several techniques have been proposed to solve this problem, for example, the ratio test (RT) and GMS [
9,
28]. The RT identifies high-quality matches by comparing the distances of the two closest candidate points. However, due to the lack of more stringent constraints, many incorrect matches remain difficult to eliminate in complex scenes. GMS are related to optical flow, point-based coherence techniques, and patch-match-based matchers, directly using smoothness to help match estimations [
29,
30,
31]. However, during the separation of true and false matches with GMS, the scale and rotation of matching points are only detected and estimated by five and eight discrete grid kernels [
32,
33,
34]. In addition, a series of putative matches are included in one grid, in which the different scales and rotations of matches are averaged. Therefore, a large portion of true matching points could be considered false matches and removed.
For remote-sensing images of coastlines, islands, and reefs that contain seafloor image pixels, the local self-similarity of the seafloor terrain leads to weak texture features and a lack of salient point features [
35]. In addition, the quality of imaging in water areas is affected by waves, solar flares, and water properties [
36,
37]. As a result, traditional matching methods have low matching accuracy. Feature-matching points that are automatically extracted in areas with poor texture still require a significant amount of manual editing before practical application [
38]. To overcome the existing issues with the weak texture areas of the seafloor for remote sensing images, the homography-based motion statistics with an epipolar constraint (HMSEC) method is proposed. This method can effectively improve the number, reliability, and robustness of matching points for weak-texture seafloor images by translating a large number of putative matches into high-match quality and using the homograph and epipolar geometry to estimate the scale and orientation influence of matching points. In combination with ORB, the matching number and accuracy are ensured and enhanced, and the distribution of matching points in the seafloor area is improved with increasing depth. In addition, different experiments were performed using the island and reef images to validate the correctness and accuracy of the proposed method. Furthermore, various matching metrics were used to compare and estimate matching results obtained by the various methods. Finally, the matching points obtained by different methods were projected into the water-depth image to analyze the distribution of matching results and estimate the detection capability of HMSEC at different water depths.
2. Methodology
The HMSEC method with ORB was adopted to achieve robust feature matching and obtain highly reliable matching points based on the neighborhood information of matching points. The ORB method was used to generate a sufficient number of putative matching points. For each initial putative matching point generated by ORB, the neighborhood information used in HMSEC was used to determine correctness and remove false matching points. In addition, epipolar geometry was used as a constraint with homography-based motion statistics to ensure the reliability of the extracted true matching points. To realize this aim, the proposed method, HMSEC with ORB, has four parts: (1) putative matching, (2) filtering with motion statistics, (3) homography-based scale and orientation adaption, and (4) matching-metrics evaluation.
Figure 1 shows the flow chart of HMSEC with ORB.
2.1. Filtering with Motion Statistics
HMSEC rely on the motion smoothness constraint, the assumption that neighboring pixels in one image move together as they often land in one object or structure. Although the assumption is not generally correct, e.g., it is violated in image boundaries, it suits most regular pixels. This is sufficient for our purpose as we are not targeting a final correspondence solution but a set of high-quality correspondences for RANSAC (Random Sample Consensus)-like approaches. The assumption means that neighboring true correspondences in one image are also close in other images, while false correspondences are not. This allows us to classify a correspondence as true or false by simply counting the number of its similar neighbors and the correspondences that are close to the reference correspondence in both images.
True matches are influenced by the smoothness constraints, while false matches are not. Let similar neighbors refer to those matches that are close to the reference match in both images. True matches often have more similar neighbors than false matches, as shown in
Figure 2. To identify true matches, HMSEC use the number of similar neighbors.
Let M be all matches across images and , and let be one match correspondence that connects the points and between two images. We define neighbors as = { | M, , d < }, and its similar neighbors as = { | , d < }, where d refers to the Euclidean distance of two points. We term , the number of elements in , motion support for .
The motion support can be used as a discriminative feature to distinguish true and false matches. Modeling the distribution of
for true and false matches, we obtain Equation (1), in which
refers to the binomial distribution, and
refers to the number of neighbors for
. The symbol
represents the probability of one true match supporting its neighbor pixels, which is close to the correct rate of correspondences. The false-match probability is denoted by
and is usually small, as false correspondences approximately satisfy the random distribution in regular pixel regions.
Furthermore, the expectation of
can be derived and formulated as Equation (2), and its variance is expressed by Equation (3).
Then, the separability between true and false matches can be defined as S and described by Equation (4), where S
, and if
,
. This indicates that the value of S is greater, andthe separability becomes increasingly reliable when the number of feature points is sufficiently large. This occurs when
is just slightly larger than
, and as a result, it is possible to obtain reliable matches from difficult scenes by increasing the number of detected feature points.
In addition, it shows that improving feature quality () can also boost separability. The distinctive attributes allow us to decide if is true or false using the threshold given in Equations (5) and (6), where and denote true and false match sets, respectively. is the threshold of . The symbol is a hyperparameter, which is empirically set to range from 4 to 6.
2.2. Homography-Based Scale and Orientation Adaptation
In the matching process for two images, significant image changes in scale and rotation exist in each image [
39]. To address this issue, multi-scale and multi-rotation solutions are adopted for image homography, which describes a geometric projective transformation from one image space to the other. I
1 and I
2 are shown in
Figure 3.
The homography between the two images is described by the H matrix, and the homographic transformation is represented by Equation (7), in which the coordinates of I
1 and I
2 are represented by
and
;
represent the nine elements of the matrix H, while
equals 1; the symbol
is the scale factor between two images, and
is the reciprocal of
; A means the matrix of the camera’s internal parameters; and R and T represent the rotation and translation vectors of two images.
Through Equations (8) and (9), the scale factor of is removed, and the coordinate in image I2, corresponding to the coordinate of the matching point in image I1, can be determined. In this process, the rotation of two images is taken into account and eliminated by the rotation matrix R that is included in the matrix H.
In the process of the homography-based scale and orientation estimation, there are deviations in the coordinates and angles for the matching points. The coordinate deviation between the matching point after motion statistics and the point obtained by the homography matrix is statistically counted for image I
2, using standard deviation. Two-times standard deviation (
) is used as a radius to construct a circle for each red point obtained by the homography matrix, represented in
Figure 3. When the corresponding matching point after motion statistics is outside the circle, such as the green point P
1, this means its deviation is greater than the two-times standard of coordinate deviation, and it is an unreliable matching point. On the contrary, the matching points P
2 and P
3 are reliable matching points to be reserved. To improve the matching accuracy of the entire image, the angle deviation between the lines connects each matching point in I
1 with its corresponding matching point in I
2 and the calculation point. Then, the two-times standard deviation (
) of the angle deviation shown by the orange dashed line can be calculated and is used to evaluate the matching point correctness and reliability. Finally, for each matching point, when its coordinate and angle deviations in total are greater than the corresponding values of
and
, the point is considered unreliable and removed.
2.3. Matching-Metrics Evaluation
To evaluate the reliability and accuracy of HMSEC for nearshore remote sensing images, five matching metrics were applied based on the results of various algorithms and their comparisons. Therefore, this comprehensive set of metrics allowed us to embed our analysis into the existing body of work. The standard metric is a metric that is proposed for binary descriptor evaluation and used to assess the accuracy and reliability of feature-descriptor matching [
17]. In addition, the putative match ratio (PMR) and matching score (MS) are also used for the raw matching performance of each image pair. Furthermore, a set of different metrics, the seafloor match ratio (SMR), seafloor matching number (SMN), and total matching number (TMN), were adopted to evaluate the image-matching results for the nearshore area via comparison with other methods and analysis of the HMSEC characteristics.
The following notations are used to illustrate the metrics in this paper: denotes all putative matches across two images; denotes all the features; denotes all inlier matches across two images; denotes all matches across two images with an epipolar constraint; denotes all seafloor matches across two images with an epipolar constraint. The PMR was adopted and used to quantify the selectivity of the descriptor in terms of the fraction of the detected features initially identified as a pair of key point matches, which is represented by the formulation of /. The ratio of these two values was generally utilized to estimate the raw matching performance for each image. The MS is formulated as / to represent the inlier ratio among all detected features, which is used to describe the robustness and transformation efficiency of feature points. The metric of the SMR is represented by / and means the percentage of the seafloor matching points in the whole set of matching points. A higher percentage value expresses a better seafloor matching result.
In image-matching research, the parameters described above are generally used for the overall evaluation of matching results. In addition, recognition and determination by the human eye are also generally used to compare location matching points with true ground points or the relative location of matching points in left and right images. However, for islands and reefs located far away from land, it is difficult for humans to access them, and consequently, matching points in images are not often compared in situ with the ground point at the seafloor.
5. Discussion
The results of various matching metrics (
Table 2,
Table 3 and
Table 4) indicate that the PMR obtained using the ORB-HMSEC method is at the middle level because the feature number detected by ORB is notably larger than that detected by the other methods. The MS value obtained using the ORB-HMSEC method was the highest for the three images, indicating that the number of putative matching points and the conversion rate from features to putative matches were the highest. The TMN values for the three study areas were determined using various methods. The results show that the ORB-HMSEC method was significantly better than the other tested methods, reaching 3010, 5348, and 6369 for Zhaoshu Island, Ganquan Island, and Lingyang Reef, respectively; relative to the second-best values, the matching numbers improved by 2317, 2767, and 1346, respectively. Thus, the results demonstrate that numerous putative matching points can be transformed efficiently into reliable matching points using motion smoothness assumptions and motion statistics.
Compared with the ORB-GMS method, the ORB-HMSEC method achieved higher SMN and TMN values. In the GMS method, only five relative scales and eight relative rotations between image pairs were defined and considered for the matching estimation of scale and rotation enhancement (
Figure 8). Furthermore, the grid-based framework divided two images into non-overlapping cells to reduce the complexity of the calculation, and only cell pairs, not corresponding match points, were classified (
Figure 8a,b). The grid kernel was fixed on one image, and eight different grid kernels were used to simulate possible relative rotations [
32]. Five different grid kernels were used to simulate the potential relative scales in other images. Therefore, during the matching process, the scale and rotation of the image pairs should be discrete to achieve rough simulations and estimations, but this also results in errors and the elimination of matching points [
47,
48]. To overcome this problem, a homography-based adaptation method was proposed and used in the HMSEC method. To accurately estimate the scale and rotation of each matching point, coordinate and angle deviations were used (
Figure 3). These results demonstrate that ORB-HMSEC has higher filtering and estimating accuracy for matching points, and the number of matching points can be confirmed and improved (
Table 6).
In underwater areas, image texture becomes weaker as the water depth increases. The distribution becomes wider, and the number of seafloor matching points acquired increases when using the ORB-HMSEC method (
Figure 6 and
Figure 7). The maximum water depth of the matching points also improved, reaching −11.66, −14.06, and −9.61 m at Zhaoshu Island, Ganquan Island, and Lingyang Reef, respectively. The number of seafloor matching points increased with increasing water depth (
Figure 7), and between −3 and −4 m, the matching number peaks. In shallower areas ranging from 0 to −2 m, the wavelength and quantity of white and breaker waves increased, which may have reduced the number of matching points. Subsequently, increasing the depth reduced the image texture and the matching number. With the ORB-HMSEC method, the number of matching points improved with increasing water depth. Thus, the depth and distribution of underwater points could be increased.
Image homography was used to eliminate the matching error caused by multi-scale and multi-rotation changes in each image. In this procedure, the two-times standard of coordinate and angle deviations was adopted to further reduce the matching error and ensure the reliability and stability of the matching results in most situations. As shown in
Figure 9, different values of the coordinate and angle thresholds were used to obtain matching points for analyzing the changing trend of the matching-point number. For the study areas of Zhaoshu Island, Ganquan Island, and Lingyang Reef, the value of 2
is 0.96, 1.12, and 0.75 pixels. The corresponding value of 2
is 0.36, 1.44, and 1.80 degrees. Through the results shown in
Figure 9, it is clear that the number of matching points changes relatively smoothly when values of the coordinate and angle threshold range from one-time to two-times standard deviations. These results demonstrate that the accuracy and reliability of matching points can be further evaluated and improved, using the adaptation constraint with the homography-based scale and orientation. Seafloor texture varies between island and reef locations. In relatively turbid and deep-water environments, the radiation information in the image and image texture are reduced. For this situation, two-time standard deviations of
and angle deviations of
could be further improved, ranging from two times to one time to ensure matching accuracy and reliability in the ORB-HMSEC method.
6. Conclusions
This study proposes a novel matching method, ORB-HMSEC, that uses homography-based motion statistics. The proposed method efficiently transforms numerous putative matching points into reliable matching points using motion smoothness assumptions and motion statistics. In this process, the scale and rotation of each matching point in the image pairs can be estimated accurately to improve the accuracy and reliability of the matching points. According to the experimental results and comparisons with other methods, the ORB-HMSEC method efficiently improves the reliability and number of matching points both on the seafloor and in land regions. The increment of the matching points reached 2672, 2767, and 1346 for the WorldView-2 multispectral images of Zhaoshu Island, Ganquan Island, and Lingyang Reef, respectively. Furthermore, the seafloor matching points of ORB-HMSEC had wider distributions and deeper water depths of −11.66, −14.06, and −9.61 m, respectively. Thus, the ORB-HMSEC method is highly reliable, with a large number of matching points in the study area. In the future, we plan to conduct a marching experiment using various remote sensing images to further explore the potential of this proposed method in different environments.