5.1. Metrics Used
The accuracy of the detection algorithms was gauged using three primary metrics: the precision, recall, and F1 score. The F1 score, which is the harmonic mean of the precision and recall, is a weighted average that favors the lowest-scoring metric. It serves as a general indicator of an algorithm’s performance and is a better-suited metric due to its ability to account for class discrepancies. Additional secondary metrics, including true positives (TPs), false positives (FPs), and false negatives (FNs), were used to compute these primary metrics.
In object detection, TPs denote correctly identified RSOs, FPs represent stars misidentified as RSOs, and FNs are RSOs that the algorithms failed to identify. In streak detection, FPs indicate detected regions that exist outside the annotated values. TPs correspond to annotated values within detected regions. FNs pertain to annotated values that are situated outside the detected regions.
5.2. Results of Object Detection
AFD performed optimally when there was camera movement because the displacement of stars between successive frames was typically minimal. This was an advantage over MFD, which subtracted non-consecutive frames that were taken at different times, resulting in a potential misalignment of stars during differencing. Thus, for longer sequences or those containing camera jitter, AFD is the recommended algorithm. The ability to retain the shape of RSOs after differencing gave MFD an edge over AFD. AFD contained overlapping stars and RSOs in adjacent frames before subtraction, resulting in reduced distinctions between static and moving objects in the resulting differenced frame. Because MFD resulted in fewer misidentifications of stars as RSOs, it is the preferred algorithm when higher precision is desired. PFT offered a significant advantage over the other techniques by substituting size filtering with tracking, which was more effective at filtering out small hot pixels that existed in a singular frame. Moreover, because PFT omitted frame differencing techniques, it avoided the visual artifacts that arose from the subtraction processes found in both AFD and MFD. These artifacts in the latter algorithms necessitate additional filtering layers for removal. Hence, PFT is the preferred algorithm for greater recall, as the use of fewer filtering techniques minimizes the loss of true RSOs after preprocessing. The overall performance of these techniques is tabulated in
Table 3.
In terms of overall performance, PFT outperformed both AFD and MFD, with an F1 score of 82%, while AFD had the lowest performance at 68%, which was primarily due to its lower precision of 73% compared to MFD and PFT, which were both at 95%. Recall presented challenges for all three algorithms, regardless of their processing techniques. Testing without any filtering during processing yielded maximum recall scores of 71%, 75%, and 87% for the respective sequences. These results highlight that the preprocessing phase significantly impacted the recall, as even without constraints during processing, the recall did not approach 100%. This suggests that preprocessing that involves thresholding and normalization filters out faint RSOs along with noise, resulting in a lower recall.
In the three sequences analyzed, which each presented distinct challenges, the performance of the detection algorithms varied. In the first sequence, which featured a mix of faint and prominent RSOs, MFD excelled in precision but struggled with recall due to its extensive filtering layers. Conversely, in the second sequence, which was primarily composed of faint RSOs, all algorithms faced recall issues because the preprocessing thresholding filtered out many faint RSOs, necessitating a trade-off between recall and filtering. In the third sequence, which was dominated by prominent RSOs, MFD marginally outperformed PFT due to its aggressive filtering, achieving higher precision and a somewhat unexpected higher recall rate, which was possibly attributed to PFT’s tracking inconsistencies with RSOs covering substantial distances between successive frames.
AFD performed the worst in all three sequences, which was primarily due to its poor precision relative to the other algorithms. This was a result of RSOs and stars appearing nearly identical after the subtraction process, which caused difficulties in differentiating between static and moving objects. Additionally, intermittent periods of long frame times resulted in RSOs being detected twice, as objects were too far apart in adjacent frames to combine bounding boxes. This variability in frame time is difficult to account for during processing, as it is a problem inherent to the dataset.
PFT emerged as the best-performing method overall, as it relied on proximity filtering to detect RSOs rather than frame differencing. This proved useful in improving its recall rate, as fewer true RSOs were filtered out during processing. Frame differencing algorithms such as AFD and MFD rely heavily on filtering to account for visual artifacts that remain after subtraction, resulting in their comparably poor recall rates. In addition, rudimentary tracking proved useful for filtering out objects with limited movement, rather than relying on shape, size, and brightness for filtering, as in AFD and MFD; this is a characteristic of unreliable RSO identification.
5.3. Results of Streak Detection
In
Figure 9, we present the streaks detected in all three image sequences, totaling twenty streaks. These images resulted from stacking the output from each set within its respective sequence.
Table 4 presents the mean streak length in pixels and the signal-to-background ratio (SBR) for each sequence. A streak caused by the release of the ballast is visible and highlighted in the green box located on the left side in the first sequence. Although this streak is relatively narrow, it meets the area-based threshold for detection. The precision, recall, F1 score, and accuracy values for the sequences are tabulated in
Table 5.
Overall, the algorithm exhibited its poorest performance in the first sequence, where it achieved an F1 score of 65%. This deficiency in performance can be attributed to the low recall, especially in sequences featuring ballast streaks and faint RSOs. It is important to note that a precision of 100% was achieved. This might have been a result of the specific definitions of TPs and FPs, which were customized for streak detection with manual annotations made for object detection. There is a possibility that the algorithm might have overperformed since it was designed to perform optimally with stable data from RSONAR, thus prioritizing the simplification of complex image processing and emphasizing detection accuracy. The relatively low recall values further suggest that the algorithm encountered challenges in identifying RSO streaks that fell below the noise threshold.