Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8

Tao, Haotian; Paul, Agyemang; Wu, Zhefu

doi:10.3390/app15010328

Open AccessArticle

Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8

by

Haotian Tao

¹,

Agyemang Paul

² and

Zhefu Wu

^2,*

¹

School of Electrical Engineering, Dalian University of Technology, Dalian 116024, China

²

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 328; https://doi.org/10.3390/app15010328

Submission received: 11 November 2024 / Revised: 26 December 2024 / Accepted: 30 December 2024 / Published: 31 December 2024

(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications)

Download

Browse Figures

Versions Notes

Abstract

:

To address the challenges associated with lightweight design and small object detection in infrared imaging for substation electrical equipment, this paper introduces an enhanced YOLOv8_Adv network model. This model builds on YOLOv8 through several strategic improvements. The backbone network incorporates PConv and FasterNet modules to substantially reduce the computational load and memory usage, thereby achieving model lightweighting. In the neck layer, GSConv and VoVGSCSP modules are utilized for multi-stage, multi-feature map fusion, complemented by the integration of the EMA attention mechanism to improve feature extraction. Additionally, a specialized detection layer for small objects is added to the head of the network, enhancing the model’s performance in detecting small infrared targets. Experimental results demonstrate that YOLOv8_Adv achieves a 4.1% increase in [email protected] compared to the baseline YOLOv8n. It also outperforms five existing baseline models, with the highest accuracy of 98.7%, and it reduces the computational complexity by 18.5%, thereby validating the effectiveness of the YOLOv8_Adv model. Furthermore, the effectiveness of the model in detecting small targets in infrared images makes it suitable for use in areas such as infrared surveillance, military target detection, and wildlife monitoring.

Keywords:

YOLOv8; infrared image; target detection; lightweight; detection accuracy

1. Introduction

The substations and related equipment of the state grid play a crucial role in the power transmission and distribution system, being regarded as core electrical assets. Ensuring the normal operation of this equipment is not only an important task for the power industry but also a key factor in maintaining the stability of the power grid. Since these electrical devices are exposed to natural environments for extended periods, they face challenges from extreme weather conditions, such as strong winds and heavy rain. These weather phenomena can lead to damage or failure of the equipment, adversely affecting the normal operation of transmission and distribution lines. Therefore, real-time monitoring of the status of electrical equipment has become an indispensable task in power inspections [1,2,3].

Compared to traditional manual inspection methods, drone inspections offer significant advantages. Firstly, drones provide high flexibility, enabling them to quickly adapt to complex inspection requirements. Secondly, their high safety standards reduce the risks associated with manual live-line operations. Finally, their cost-effectiveness helps alleviate the high labor intensity caused by the widespread distribution of power lines. These advantages make drone inspections highly valuable in the power industry, as they not only improve inspection efficiency but also ensure the safe operation of the power grid, ultimately contributing to the modernization of the power system [4,5,6].

Recent studies primarily focus on using drones to collect inspection images and apply deep learning for detecting the operational status of electrical equipment [7,8,9]. Wu et al. [10] primarily presents a novel object detection method by improving the Faster R-CNN algorithm [11]. This involves the integration of the InResNet architecture to boost feature extraction performance, substituting normalization techniques and activation functions, along with implementing a dense connection framework. These modifications lead to a notable enhancement in both the detection accuracy and recall rate for infrared images of electrical devices. Experimental results indicate that the proposed algorithm surpasses conventional detection methods in several performance metrics. Ou et al. [12] also optimizes the Faster R-CNN model, specifically enhancing the feature extraction part of VGG16 by reducing the number of high-level convolutions to accelerate training and testing speeds. Additionally, to improve the model’s detection accuracy for elongated devices, it introduces anchors with different aspect ratios. Experimental results indicate that the improved model shows enhancements in both accuracy and operational speed and exhibits robustness against noise and brightness interference.

Similarly, Zhang et al. [13] introduces an improved object detection model, YOLO-SD, specifically addressing the issues of low accuracy and lack of robustness in detecting equipment defects in substations. Through the integration of the C3+ feature extraction module and the SimAM attention mechanism, along with the development of a specialized fusion loss function, NWD-CIoU, the YOLO-SD model demonstrates exceptional accuracy and robustness in detecting defects in substation equipment, thereby offering robust support for the deployment of intelligent online inspection systems. Liu et al. [14] proposes a method for the recognition of abnormal infrared images of electrical equipment based on YOLOv4. It constructs four datasets of electrical equipment images and analyzes the impact of relevant factors on object detection effectiveness through experiments, ultimately establishing the optimal detection model. The findings reveal that YOLOv4 excels in identifying anomalies within infrared images of electrical equipment and demonstrates a strong potential for application in the infrared inspection of substation devices.

Moreover, Shan et al. [15] tackle the difficulties in infrared image object detection, including small object dimensions, insufficient feature representation, and background noise, through targeted enhancements to the Single Shot MultiBox Detector (SSD) framework [16]. This is achieved by optimizing the anchor boxes and applying lightweight processing to the network, successfully enhancing detection accuracy and precision. In response to similar challenges in small target infrared image detection, Ding et al. [17] proposes a method that removes low-resolution layers from the network, enhances high-resolution layers, and introduces an adaptive pipe filter (APF), combined with a proposed two-stage detection approach. Experimental results in complex scenarios demonstrate that this method possesses high recall rates and accuracy for detecting and tracking infrared small targets.

The above studies indicate that deep learning-based methods are highly feasible for the detection and recognition of infrared images in electrical inspection. However, several important issues remain to be explored in depth. For instance, how to achieve model lightweighting while ensuring detection accuracy to accommodate the hardware limitations of inspection drones. Moreover, addressing the issue of real-time performance and minimizing network parameters efficiently to enhance detection speed remains a significant challenge.

To tackle these issues, this study introduces the enhanced YOLOv8_Adv model designed for the intelligent detection and recognition of infrared substation electrical equipment images. Firstly, the model introduces the FasterNet module with partial convolution (PConv) into the YOLOv8 backbone, significantly reducing computation and memory access, achieving model lightweighting. Secondly, generalized-sparse convolution (GSConv) and VoVNet with GSConv and CSPNet (VoVGSCSP) modules are added to the neck, facilitating the multi-stage fusion of feature maps and integrating the EMA attention mechanism to improve the model’s capability in capturing and representing essential features. Lastly, an additional upsampling layer is introduced in the head, with a detection scale of 160 × 160 × 32, to improve small target detection accuracy in infrared backgrounds. The improved YOLOv8_Adv model achieves lightweight design and high accuracy in infrared image detection and recognition for electrical equipment. It effectively replaces traditional manual inspections, better meets real-time requirements, and is suitable for deployment on mobile drone devices.

2. YOLOv8 Framework

The YOLOv8 model, introduced by Ultralytics in early 2023 as part of the YOLOv5 series, represents a significant advancement in the You Only Look Once (YOLO) family of object detection models. Building upon the efficient, real-time detection capabilities of its predecessors, YOLOv8 incorporates several technical innovations that markedly enhance its performance in object detection tasks [18,19].

The model employs a novel backbone network, a refined loss function, and an anchor-free detection head, resulting in notable improvements in precision, speed, and flexibility. Particularly in real-time detection scenarios, YOLOv8 is capable of delivering outstanding performance even under limited computational resources.

As illustrated in Figure 1, the YOLOv8 architecture is composed of three primary parts: the backbone, the neck, and the head. The backbone employs the CSPDarkNet structure [20], which consists of various layers such as Conv, C2f, and Spatial Pyramid Pooling Fast (SPPF) to extract image features. The Conv layers are responsible for extracting fundamental features, utilizing SiLU activation function alongside batch normalization to normalize feature representations. The C2f module further captures detailed features to enhance the model’s ability to discern finer details, while the SPPF layer aggregates information from different scales through multi-scale pooling, strengthening the model’s robustness when detecting objects of varying sizes.

Following the backbone, the neck section, comprising the Upsample, Concat, and C2f modules, is designed to further extract and fuse multi-scale feature maps, which bolsters the model’s performance in multi-scale object detection tasks. The Upsample layer enlarges low-resolution feature maps to align them with high-resolution ones, while the Concat layer merges features from different levels, enabling the model to simultaneously process large, medium, and small objects. The C2f module further integrates multi-scale information to ensure that feature maps carry rich contextual information, ultimately enhancing detection performance. Lastly, the head section, composed of Conv, Concat, C2f, and detect layers, processes the features further and outputs the object’s position and class information. The detect layer, serving as the output layer, generates the final bounding boxes and class labels with high accuracy and localization capability.

YOLOv8 demonstrates outstanding performance across various object detection tasks and is currently one of the state-of-the-art (SOTA) models especially suited for efficient detection in resource-constrained environments [21]. Thanks to its structural innovations, YOLOv8 can maintain excellent detection precision and speed in real-time applications, making it suitable for edge computing, intelligent security, autonomous driving, and other scenarios that demand high responsiveness [22,23,24]. Despite its superior performance in general detection tasks, YOLOv8 exhibits certain limitations in specific application scenarios, such as the infrared image detection of electrical equipment, where it struggles with detecting distant and small objects [25]. This limitation primarily arises from the low contrast and high noise background characteristics of infrared images. To address these challenges, this study introduces structural modifications and optimizations to the base YOLOv8 model, aimed at improving its detection performance to better meet the engineering requirements of infrared image detection in electrical equipment monitoring.

3. YOLOv8_Adv Model

In order to make the YOLOv8 network complete the detection task better, this paper makes different degrees of changes and replacements to the basic framework introduced above. Specifically, the backbone part is first improved, replacing the original C2f module with the C2f-fast module; then introducing the GSConv and VoVGSCSP modules to the neck part, and integrating the EMA attention mechanism; finally, adding a dedicated detection layer to the head part.

3.1. Replacing the C2f Module in the Backbone

Collecting, detecting, and recognizing infrared images on mobile platforms often requires considerable battery power, computational resources, and time. As a result, developing a lightweight model has emerged as a key focus for enhancement. This shift aims to optimize performance while minimizing resource consumption, ultimately facilitating more efficient and effective processing of infrared imagery on mobile devices. To further reduce the size of the YOLOv8 model and improve computational efficiency, this paper replaces the C2f module in the original network with a lightweight C2f-fast module. Specifically, the bottleneck module in the C2f structure is replaced by the FasterNet block, effectively reducing the model size. The network structure is shown in Figure 2.

FasterNet is a new type of neural network introduced at the 2023 CVPR conference. Its lightweight design enables faster detection speed and higher accuracy across various devices, including GPUs, CPUs, and ARM processors. The main benefit of FasterNet stems from its PConv technology, which enhances the extraction of spatial features by minimizing unnecessary computations and improving memory utilization [26]. The design of the network is presented in Figure 3.

PConv extracts spatial features by performing regular convolution on a subset of input channels while keeping the other channels unchanged. When handling continuous or periodic memory access, the first or last consecutive channels are selected as representatives of the feature map. By introducing PConv technology, the FasterNet block significantly contributes to spatial feature extraction and enhances computational efficiency. The corresponding FLOPs for PConv are outlined below:

FLOPS = H \times W \times K^{2} \times C_{P}^{2}

(1)

where,

H

and

W

refer to the width and height of the feature map, respectively;

K

represents the size of the convolution kernel; and

C_{P}

indicates the number of channels used in the convolution process.

r

represents the ratio of channels actually computed in each convolution operation, typically

r = C_{P} / C = 0.25

. As a result, the FLOPs for PConv are only 1/16 of those required for standard convolution.

In this paper, we present the FasterNet block integrated with PConv within the feature extraction backbone module, which notably decreases the computational load and memory access. This enhancement contributes to a more lightweight model overall.

3.2. Importation of the GSConv and VoVGSCSP Modules in Neck Section

In the deployment of models for infrared image processing of substation equipment, both model size and detection accuracy are crucial factors [27,28]. On the one hand, due to the typically large model size, deployment is challenging and detection times are prolonged, failing to meet real-time requirements. On the other hand, lightweight networks, such as EfficientNet [29], MobileNet [30], and SqueezeNet [31], have significantly reduced the computational load by using depthwise separable convolutions, which decrease the convolution kernel parameters. However, this speed gain comes at the cost of reduced detection accuracy.

In order to overcome these challenges, this paper presents a more efficient and compact GSConv structure, as illustrated in Figure 4. The input feature map is first downsampled using standard convolution to generate a feature map with half the number of channels as the final output feature map. Then, a depthwise convolution (DWConv) module processes these feature maps, and the two processed results are concatenated along the channel dimension. Finally, a shuffle operation is applied to produce the final output feature map. The main advantage of the GSConv structure is its ability to maintain high detection accuracy while providing faster processing speed and a more compact model size.

Furthermore, the VoVGSCSP module integrates the characteristics of VoVNet [32], the advantages of GSConv, and a cross-stage partial network (CSPNet) [33]. This combination reduces the computation and parameter count while effectively preserving the feature map’s representational capacity, enhancing the model’s computational efficiency. By incorporating the CSP module, this design concatenates feature maps across multiple stages, effectively retaining original information and enhancing the model’s generalization ability through cross-stage feature fusion. The network structure is shown in Figure 5.

3.3. Integrating the EMA Attention Mechanism

To enhance dependency relationships between pixels, researchers have introduced various attention mechanisms in convolutional neural networks, such as CBAM and SE, which show notable advantages when combining cross-dimensional attention weights with input features. However, these attention modules often rely on extensive pooling operations, significantly increasing computational cost. In order to resolve this problem, the efficient multi-scale attention (EMA) mechanism has been introduced [34], with its network structure illustrated in Figure 6.

In order to preserve channel information while minimizing computational costs, the EMA mechanism reorganizes certain channels as batch dimensions and divides channel dimensions into several sub-feature groups, ensuring an even distribution of spatial-semantic features across each feature group. More specifically, EMA not only recalibrates the weights of channels in each parallel branch by incorporating global information but also enhances the aggregation of output features from the two parallel branches through cross-dimensional interactions, effectively capturing pairwise pixel-level relationships. On this basis, this paper integrates the EMA attention mechanism into the neck layer of the model This integration aims to enhance the model’s capacity to effectively capture critical features, thereby significantly improving the accuracy of detecting targets related to substation equipment. By leveraging the EMA attention mechanism, the model becomes more adept at focusing on relevant information, which ultimately leads to better performance in identifying and classifying key components within the operational environment.

3.4. Integration of Small Target Detection Layer in Head Section

In infrared images of substation equipment, large equipment such as arresters, circuit breakers, and bushings, as well as smaller equipment like insulators, exhibit varying bounding box sizes. Among these, features of small targets are often weak and challenging to recognize clearly in the images. For the YOLOv8n model with a high downsampling rate, learning and extracting feature information of small targets in deeper feature maps is notably difficult. Additionally, equipment in transmission lines often obscure each other, further increasing the detection difficulty.

In the original YOLOv8n network, the input image is scaled to a size of 640 × 640 × 3, resulting in a limited receptive field in the shallow layers and a weak capacity to express semantic information. To better extract defect details and features of small targets, the improved YOLOv8_Adv model incorporates an additional upsampling layer into its existing architecture, linking this layer to the prediction component in the detection head. This modification significantly boosts the model’s ability to accurately detect small substation equipment. This improvement enables a detection scale of 160 × 160 × 32, as illustrated in the feature fusion structure in Figure 7.

Although the integration of a small target detection layer results in a minor increase in the computational demands, this enhancement substantially improves the detection accuracy for small targets. By focusing specifically on these smaller, often more challenging objects, the layer enables the model to achieve a higher level of precision and reliability. This improvement is particularly crucial in applications where accurately identifying small targets can significantly impact overall performance, such as in surveillance or complex background environments. The trade-off of increased computation is thus justified by the marked gains in accuracy and effectiveness.

Moreover, it significantly enhances the model’s overall performance when processing complex images of electrical equipment. By improving the model’s ability to identify and analyze intricate details, this layer ensures more reliable and precise detection, ultimately contributing to better operational outcomes in challenging environments.

3.5. The Enhanced YOLOv8_Adv Model

Figure 8 illustrates the enhanced YOLOv8_Adv network structure. In contrast to the base network, the optimized YOLOv8_Adv network has been specifically designed for the maintenance and monitoring of substation electrical equipment based on infrared images, especially for the difficult task of detecting small targets in intricate infrared backgrounds.

Initially, the backbone is improved by substituting the original C2f module with the C2f-fast module, which reduces the computational complexity while improving feature fusion capabilities, making the network more efficient in processing infrared images.

In the neck, lightweight GSConv and VoVGSCSP modules have been introduced as feature extraction units, aimed at capturing critical features more efficiently, thus reducing the overall computational load. Additionally, the EMA attention mechanism has been integrated within the neck to further enhance the model’s sensitivity to key features, improving its accuracy in capturing essential features in infrared images.

Finally, a specialized detection layer has been added to the head to improve the detection accuracy and recall rate for small targets in infrared backgrounds within substations. This layer ensures that the model can accurately identify and recall smaller objects that are often challenging to detect, meeting critical requirements for infrared-based substation maintenance.

With these enhancements, the YOLOv8_Adv model now meets the precision, computational efficiency, and detection requirements for infrared operational monitoring in substations, offering superior performance in identifying and locating small electrical equipment in complex infrared environments.

4. Experimental Results and Discussion

4.1. Dataset Description

The image dataset used for model training was obtained from Roboflow [35], and the images were captured using the FLIR T600 single-band infrared thermal imager (FLIR, Wilsonville, 2010, USA). This thermal imager operates within a spectral range of 7.5–14 μm, belonging to the long-wave infrared (LWIR) region, which is well-suited for capturing thermal radiation emitted by objects with temperatures typical in industrial environments. With a spatial resolution of 640 × 480 pixels, the thermal imager enables detailed thermal imaging of substation components. The dataset includes images from multiple substations, rather than a single site, focusing on seven types of equipment: isolator, circuit breakers, bushings, current transformers, isolation switch 1, isolation switch 2, and insulators. A total of 1670 images were collected, covering both normal operating conditions and fault states, thus reflecting a wide range of scenarios encountered in practical applications. The dataset used in this study is available for download from [36] for further research and applications.

Due to the limited number of original samples, various data augmentation techniques were employed to expand the dataset, including rotation, flipping, brightness adjustment, and Gaussian noise addition. Additionally, to accommodate the input requirements of the network model, the resolution of the original images was adjusted from 640 × 480 to 640 × 640. After applying these data augmentation methods, a total of 10,020 infrared images were obtained. The augmented dataset was strategically divided into training, validation, and test sets in a ratio of 6:2:2 to ensure comprehensive model training and performance evaluation. Each image was meticulously annotated using the LabelImg tool to accurately identify the device types and their locations within the images. This rigorous annotation process ensures the reliability and consistency of the dataset, laying a solid foundation for effective model training.

In addition, Figure 9 also visualizes the dataset. It can be seen from the figure that the number of images in the current transformer and insulator categories is greater than that in the other five categories. This is because these two types of power components are used most in actual substations, and the data distribution is also in line with reality. The main difference between isolation switch 1 and isolation switch 2 is that isolation switch 1 is used for larger circuit isolation needs and supports high voltage and high current, while isolation switch 2 is used for isolation needs in a smaller range and has a simpler structure. These not only show the quality and details of the annotated images, but it also shows that this high-quality annotation is crucial to enhancing the model’s ability to detect targets in infrared images. In short, this dataset focuses on small object detection in infrared images and covers various complex situations that may occur during substation operation, thus providing a solid foundation for model evaluation and practical application.

4.2. Experimental Setup and Comparative Metrics

The experimental system utilized in this study is powered by an Intel Core i9-10940X processor, featuring a base frequency of 3.3 GHz and 28 threads. The operating system is Ubuntu 22.04, using an NVIDIA RTX 3080TI GPU and CUDA 12.2. The training environment is based on the deep learning framework PyTorch, with a training batch size of 32 and 500 iterations.

To analyze the experimental results, this study employs mean average precision (mAP) and average precision (AP) for each category as metrics for evaluating the algorithm’s performance. Average precision is determined by calculating the area under the precision (P) and recall (R) curve. The formulas for calculating P, R, and AP are given below:

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

A P = \int_{0}^{1} P_{t h} (R) d_{R}

(4)

where, FN, FP, and TP denote false negatives, false positives, and true positives, respectively; N denotes the total number of categories detected; and

A P_{i}

indicates the detection accuracy for different categories.

Generally, the higher the detection rate for a specific category, the larger the AP value. However, when evaluating multi-class performance across the entire dataset, using mAP is more appropriate. The mAP is calculated by averaging the AP values for all categories in the dataset, providing a measure of the model’s overall detection performance across the entire dataset. In this study, the evaluation metric is the average detection precision at an intersection over union (IoU) threshold of 0.5, denoted as [email protected], and is computed as follows:

m A P @ 0.5 = \frac{1}{N} \sum_{i = 1}^{N} A P_{i} (I o U_{t h} = 0.5)

(5)

4.3. Experimental Results and Comparative Evaluation

4.3.1. Performance of Device Identification

To assess the recognition effectiveness of the enhanced YOLOv8_Adv model on different devices, this section will analyze its classification results using a confusion matrix. The classification results are shown in Figure 10.

The results of the experiment reveal that the YOLOv8_Adv model shows a remarkable performance in classifying infrared images for substation electrical equipment, with an average accuracy rate of 96%. The model attained high accuracy across various equipment categories, verifying its robustness and effectiveness. Specifically, the classification accuracy for the isolator and isolation switch 1 reached 100%, demonstrating its exceptional precision and reliability in recognizing these devices and effectively handling complex infrared backgrounds. For the circuit breakers, the model achieved an accuracy of 95%, while for the bushings, it reached 94%, showing that YOLOv8_Adv maintains strong recognition performance for medium-sized devices with a low error rate. For the isolation switch 2 and insulators, the model’s accuracy reached 96% and 97%, respectively, further demonstrating its accuracy in detecting smaller devices.

However, the experiment also revealed some common types of errors, particularly false positives and false negatives. Specifically, the current transformer is occasionally misclassified as an isolator, which appears as a false positive in the confusion matrix. The underlying reason for this might be the similarity in infrared features between the current transformer and the isolator, making it difficult for the model to distinguish between the two devices. These errors typically occur when the model mistakenly classifies an actual current transformer as an isolator, thereby incorrectly assigning it to the isolator category.

On the other hand, false negative errors were also observed in the experiment, particularly when multiple different devices were misclassified as background. These errors indicate that the model failed to recognize some devices, possibly due to interference from the background or the small size of the devices, which makes them harder to detect effectively in the image. Such misclassifications reflect the model’s tendency to lose track of targets in more complex or occluded scenarios.

These results indicate that the YOLOv8_Adv model, with optimized backbone, neck, and head structures, has effectively enhanced accuracy and recall in small device recognition, especially in complex infrared backgrounds with multiple-object interference, while maintaining high detection precision. Overall, the YOLOv8_Adv model meets the high-precision requirements of real-world engineering tasks in the infrared detection of electrical equipment, making it particularly suitable for infrared image monitoring in substations and facilitating the automated monitoring and maintenance of equipment conditions.

4.3.2. Comparison of Different Testing Methods

In order to rigorously assess the performance enhancements of the refined YOLOv8_Adv model, a thorough comparative evaluation is performed, benchmarking it against five established baseline models YOLOv8n [37], YOLOv8_Faster Net [38], YOLOv8_Gsconv [39], YOLOv8_fastC2f [40], and YOLOv8_Biformer [41].

Figure 11 presents a comparative analysis of the P–R curves for the improved YOLOv8_Adv model against various baseline models.

As illustrated, the YOLOv8_Adv model exhibits superior performance on the P–R curve, with both precision and recall approaching the ideal value of 1. This indicates that the model effectively identifies positive samples in detection tasks while maintaining a relatively low false positive rate, achieving an improvement of approximately 4.1% compared to the original YOLOv8n. In contrast, the YOLOv8_FasterNet and YOLOv8_Biformer models demonstrate satisfactory precision within high recall intervals, yet their overall performance does not match that of the YOLOv8_Adv. Furthermore, other models, including the YOLOv8n, YOLOv8_GSconv, and YOLOv8_fastc2f, exhibit a significant decline in precision at elevated recall levels, suggesting that these models may compromise accuracy while enhancing recall, potentially resulting in a higher false positive rate.

Figure 12 illustrates the comparison of [email protected] curves between the improved YOLOv8_Adv and the other baseline models. The results show that after training iterations exceed 65 epochs, the average precision of the YOLOv8_Adv significantly outperforms the baseline models. After 500 iterations, the [email protected] of YOLOv8_Adv increased from 94.9% to 98.7%, marking an improvement of 4.0%.

It is evident that YOLOv8_Adv achieves the highest [email protected] towards the end of the training phase, maintaining stability thereafter. This indicates that the model possesses superior learning and generalization capabilities. While other models also stabilize by the end of training, they do not reach the same level as YOLOv8_Adv during the same period (e.g., between epochs 20 and 80). Furthermore, the convergence speed highlights a key difference: YOLOv8_Adv experiences a rapid increase in [email protected] in the early stages of training, indicating fast convergence and demonstrating its ability to effectively extract features.

Table 1 presents a comparison of data values between the improved YOLOv8_Adv model and the baseline models. The comparison is conducted using a training dataset comprised exclusively of augmented infrared images of substation equipment. The augmentation process significantly improves the model’s robustness and performance by introducing a wide variety of scenarios and environmental conditions.

The data presented in Table 1 indicates that under the condition of 640 × 640 pixel images, the YOLOv8_Adv model demonstrates superior overall performance, with the highest P and [email protected] among all compared models, reaching 97.3% and 98.7%, respectively. Moreover, despite exhibiting a marginal reduction of only 0.1% in recall (R) compared to the top performing YOLOv8_Biformer model, the YOLOv8_Adv model achieves a significantly lower GFLOPs value of just 10.1, underscoring its superior efficiency and precision in object detection tasks. The YOLOv8_Adv demonstrates improvements of 5%, 5.2%, and 3.9% in P, R, and [email protected], respectively, when compared to the baseline YOLOv8n model. Simultaneously, its GFLOPs value also decreases by 2.3, further demonstrating its significant performance enhancement relative to the baseline model. It is worth noting that when we focus solely on the GFLOPs value, YOLOv8_FasterNet offers relatively good performance with the lowest computational load of 9.2 GFLOPs. This result indicates that YOLOv8_FasterNet is more advantageous in resource-limited application scenarios, making it suitable for implementing object detection tasks on devices with constrained computing capabilities. Overall, these data reflect the differences in efficiency and accuracy among the various models, providing strong evidence for selecting models that are suitable for specific application needs, while emphasizing the superior performance demonstrated by the YOLOv8_Adv model.

4.3.3. Ablation Experiment

An ablation experiment was performed on the same dataset to evaluate the influence of the introduced modules on the detection performance of infrared devices, with the results presented in Table 2. The comparison of results highlights that the YOLOv8n-C, YOLOv8n-CG, YOLOv8n-CGV, and YOLOv8_Adv models demonstrate notable enhancements in [email protected] values over the baseline YOLOv8n model, achieving improvements of 2.1%, 2.7%, 3.4%, and 4.1%, respectively. This indicates that the introduced modules have played a positive role in enhancing detection performance.

Specifically, YOLOv8n-C successfully reduced the computational load by incorporating the C2f-fast module into the backbone network, leading to an 8.9% decrease in GFLOPs. This result demonstrates the effectiveness of its lightweight improvement, enabling the model to perform computations more efficiently while maintaining a certain level of accuracy. YOLOv8n-CG adopted the GSConv module in the neck part, which not only improved the model’s accuracy but also reduced GFLOPs by 13.7%. This improvement highlights the dual advantages of the GSConv module in enhancing performance and reducing computational complexity, thereby increasing the model’s applicability in real-world scenarios.

Furthermore, YOLOv8n-CGV combined the GSConv and VoVGSCSP modules in the neck part, resulting in an 18.5% reduction in GFLOPs. This combination not only optimized the model’s computational efficiency but also enhanced its detection capability in complex scenarios, demonstrating the effectiveness of multi-module collaborative work. Finally, expanding upon YOLOv8n-CGV, the YOLOv8_Adv model incorporates the EMA attention mechanism, which significantly strengthens its ability to extract and utilize key features, leading to substantial accuracy gains. These experimental results validate the contributions of individual modules and highlight the strategic advantage of modular design in optimizing object detection systems.

4.3.4. Visual Comparison of Recognition Effects

To verify the model’s robustness in detecting devices across complex background conditions, this study conducts a comparative analysis with the YOLOv8n model. The assessment leverages challenging and representative images from the infrared test dataset, as shown in Figure 13.

From the figure, it is clear that the YOLOv8_Adv model outperforms the YOLOv8n model in terms of performance. The detection results show that the bounding boxes are drawn around the identified equipment, with each box labeled to indicate the predicted device category (e.g., “breaker”, “isolator”, etc.) and confidence score, which reflects the model’s confidence in the prediction. These bounding boxes are key to understanding the model’s performance in terms of accuracy and object localization.

According to the data provided in the figure, the YOLOv8n model achieved an average detection accuracy of 87.9% on a randomly selected set of 16 infrared images. In contrast, the YOLOv8_Adv model achieved a detection accuracy of 93.05% on the same images, improving by 5.86% compared to YOLOv8n. This significant improvement highlights the effectiveness of YOLOv8_Adv in handling complex scenes and dynamic backgrounds. The enhanced model generated more accurate bounding boxes, which aligned more closely with the objects, thus reducing errors related to misidentifying non-target objects and failing to identify actual objects.

Moreover, further analysis shows that YOLOv8_Adv also performs excellently in multi-object detection tasks, with accuracy improving from 86.67% to 92.17%. The detection results in the figure displays multiple bounding boxes for each image, with each box corresponding to a different object in the scene. This indicates that the improved model has an enhanced stability and capability in identifying multiple objects within a single image, especially in complex or crowded scenes.

These practical detection data clearly indicate that the YOLOv8_Adv model maintains a high level of detection accuracy while achieving network lightweighting, effectively providing a flexible solution tailored to the application requirements of mobile devices. In conclusion, these findings underscore the superior performance of the YOLOv8_Adv model in practical applications and emphasize the importance of its further adoption and integration in infrared image detection technologies.

4.3.5. Computational Efficiency Analysis

The task of small target detection in infrared images places high demands on the accuracy and real-time performance of algorithms, making the selection of an appropriate model particularly important. To this end, this paper systematically analyzes the performance of various YOLOv8 variant models, focusing on the number of layers, parameter count, inference speed (FPS), and GFLOPs, to evaluate the ability of these models to achieve high-precision detection while meeting real-time requirements. This analysis not only provides a theoretical foundation for validating the effectiveness of the proposed models but also offers a valuable reference for their deployment in different practical application scenarios.

Table 3 provides a comparison of the computational efficiency of various YOLOv8 variant models. From the data, it can be observed that YOLOv8n, with the lowest parameter count of 2.92 M and a relatively low computational complexity of 12.4, achieves an inference speed of 175.4 FPS, demonstrating a significant advantage in speed-sensitive tasks. Next, YOLOv8_FasterNet further reduces the parameter count to 1.72M and the computational complexity to 9.2, exhibiting extreme lightweight characteristics, but its inference speed is limited to 137.0 FPS, making it suitable for resource-constrained scenarios. YOLOv8_Gsconv achieves the highest inference speed of 192.3 FPS with a moderate parameter count of 2.64M and a relatively low computational complexity of 11.6, striking an ideal balance between real-time performance and computational efficiency. Similarly, YOLOv8_fastC2f, with a slightly lower parameter count of 2.56M and computational complexity of 11.1, achieves an inference speed of 181.8 FPS, exhibiting efficiency comparable to YOLOv8_Gsconv. In contrast, although YOLOv8_Biformer has the highest parameter count of 2.93M and computational complexity of 12.7, its inference speed is 172.4 FPS, making it more suitable for precision-focused complex scenarios.

The proposed YOLOv8_Adv model achieves a good balance in three aspects: 344 layers, 2.34 M parameters, and a computational complexity of 10.1. Although its inference speed of 163.9 FPS is slightly lower than that of some variants, it significantly enhances the feature extraction capability through a complex network structure, performing exceptionally well in small target detection tasks in infrared images. Especially when dealing with complex backgrounds and weak target features, this model can significantly improve the detection accuracy while fully meeting real-time requirements in practical applications. This indicates that the YOLOv8_Adv model strikes an ideal trade-off between accuracy and speed, providing an efficient and reliable solution for small target detection in infrared images. In conclusion, the various models exhibit different optimization orientations in terms of parameter count, computational complexity, and inference speed, providing a range of options to meet diverse practical needs.

5. Conclusions

This paper introduces an upgraded YOLOv8_Adv model aimed at overcoming the difficulties in detecting and recognizing electrical equipment in infrared images. Building on an in-depth analysis of the YOLOv8 model, comprehensive optimizations were applied to its structure by incorporating the FasterNet module with PConv into the backbone, integrating the GSConv and VoVGSCSP modules in the neck, and adding the EMA attention mechanism. Furthermore, a specific detection layer designed for small objects was integrated into the head to boost the model’s efficiency in identifying electrical equipment in infrared images. These improvements significantly enhance the model’s lightweight nature and detection accuracy. The findings indicate that the detection precision across seven categories of electrical equipment rose from 94.8% to 98.7%, accompanied by an 18.5% reduction in GFLOPs, effectively fulfilling the demands of substation equipment monitoring using infrared imagery. Future research will focus on processing infrared video data of electrical equipment and deploying the improved model on edge computing platforms for drones, enabling practical applications on mobile devices. However, challenges persist in scenarios such as defect detection, continuous 24 h monitoring, and temperature-based detection using infrared temperature data. Future work will focus on the processing of infrared image data of electrical equipment and deploying the enhanced model on edge computing platforms for drones, facilitating its practical application on mobile devices.

Author Contributions

Conceptualization, H.T. and Z.W.; methodology, H.T.; validation, A.P. and Z.W.; resources, Z.W.; visualization, A.P.; writing—original draft preparation, H.T. and A.P.; writing—review and editing, A.P. and Z.W.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Foundation of Zhejiang Provincial Natural Science of China under Grant No. LZ22F010005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ge, H.; Ding, J.; Dong, Y. Review of Big Data Analysis Technology for Power Equipment State. In Proceedings of the 2023 2nd Asian Conference on Frontiers of Power and Energy (ACFPE), Chengdu, China, 20–22 October 2023; pp. 89–93. [Google Scholar]
Wang, Z.; Gao, Q.; Xu, J.; Li, D. A Review of UAV Power Line Inspection. In Advances in Guidance, Proceedings of 2020 International Conference on Guidance, Navigation and Control, ICGNC 2020, Tianjin, China, 23–25 October 2020; Yan, L., Duan, H., Yu, X., Eds.; Springer: Singapore, 2022; pp. 3147–3159. [Google Scholar]
Yang, L.; Fan, J.; Liu, Y.; Li, E.; Peng, J.; Liang, Z. A Review on State-of-the-Art Power Line Inspection Techniques. IEEE Trans. Instrum. Meas. 2020, 69, 9350–9365. [Google Scholar] [CrossRef]
Cao, Y.; Xu, H.; Su, C.; Yang, Q. Accurate Glass Insulators Defect Detection in Power Transmission Grids Using Aerial Image Augmentation. IEEE Trans. Power Deliv. 2023, 38, 956–965. [Google Scholar] [CrossRef]
Tang, N.; Landas, N.; Rodrigues, Y.R.; Monteiro, M.R. Industry State-of-Art and Opportunities for the Use of Drones in Smart Grids Inspections. In Proceedings of the 2024 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), Ischia, Italy, 19–21 June 2024; pp. 327–331. [Google Scholar]
Liu, Y.; Li, X.; Qiao, R.; Chen, Y.; Han, X.; Paul, A.; Wu, Z. Lightweight Insulator and Defect Detection Method Based on Improved YOLOv8. Appl. Sci. 2024, 14, 8691. [Google Scholar] [CrossRef]
Li, Y.; Feng, D.; Li, S. A Review of Transmission Line Defect Detection Based on Deep Learning Object Detection Techniques. In Proceedings of the 18th Annual Conference of China Electrotechnical Society, Beijing, China, 15–17 September 2024; Springer Nature: Singapore, 2024; pp. 295–309. [Google Scholar]
Liu, Y.; Liu, D.; Huang, X.; Li, C. Insulator Defect Detection with Deep Learning: A Survey. IET Gener. Transm. Distrib. 2023, 17, 3541–3558. [Google Scholar] [CrossRef]
Ullah, I.; Khan, R.U.; Yang, F.; Wuttisittikulkij, L. Deep Learning Image-Based Defect Detection in High Voltage Electrical Equipment. Energies 2020, 13, 392. [Google Scholar] [CrossRef]
Wu, C.; Wu, Y.; He, X. Infrared Image Target Detection for Substation Electrical Equipment Based on Improved Faster Region-Based Convolutional Neural Network Algorithm. Rev. Sci. Instrum. 2024, 95, 043702. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Ou, J.; Wang, J.; Xue, J.; Wang, J.; Zhou, X.; She, L.; Fan, Y. Infrared Image Target Detection of Substation Electrical Equipment Using an Improved Faster R-CNN. IEEE Trans. Power Deliv. 2023, 38, 387–396. [Google Scholar] [CrossRef]
Zhang, N.; Yang, G.; Wang, D.; Hu, F.; Yu, H.; Fan, J. A Defect Detection Method for Substation Equipment Based on Image Data Generation and Deep Learning. IEEE Access 2024, 12, 105042–105054. [Google Scholar] [CrossRef]
Liu, T.; Li, G.; Gao, Y. Fault Diagnosis Method of Substation Equipment Based on You Only Look Once Algorithm and Infrared Imaging. Energy Rep. 2022, 8, 171–180. [Google Scholar] [CrossRef]
Shan, D.; Yang, S.; Zhao, Z.; Gao, X.; Zhang, B.; Zhang, B. Lightweight Infrared Object Detection Network Based on Improved SSD. In Proceedings of the 2023 6th International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 24 December 2023; Association for Computing Machinery: New York, NY, USA, 2024; pp. 347–352. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
Ding, L.; Xu, X.; Cao, Y.; Zhai, G.; Yang, F.; Qian, L. Detection and Tracking of Infrared Small Target by Jointly Using SSD and Pipeline Filter. Digit. Signal Process. 2021, 110, 102949. [Google Scholar] [CrossRef]
Zhai, X.; Huang, Z.; Li, T.; Liu, H.; Wang, S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics 2023, 12, 3664. [Google Scholar] [CrossRef]
Huangfu, Z.; Li, S. Lightweight You Only Look Once v8: An Upgraded You Only Look Once v8 Algorithm for Small Object Identification in Unmanned Aerial Vehicle Images. Appl. Sci. 2023, 13, 12369. [Google Scholar] [CrossRef]
Senussi, M.F.; Kang, H.-S. Occlusion Removal in Light-Field Images Using CSPDarknet53 and Bidirectional Feature Pyramid Network: A Multi-Scale Fusion-Based Approach. Appl. Sci. 2024, 14, 9332. [Google Scholar] [CrossRef]
Yang, S.; Wang, W.; Gao, S.; Deng, Z. Strawberry Ripeness Detection Based on YOLOv8 Algorithm Fused with LW-Swin Transformer. Comput. Electron. Agric. 2023, 215, 108360. [Google Scholar] [CrossRef]
Fan, Y.; Zhang, L.; Li, P. A Lightweight Model of Underwater Object Detection Based on YOLOv8n for an Edge Computing Platform. J. Mar. Sci. Eng. 2024, 12, 697. [Google Scholar] [CrossRef]
Wang, A.; Yuan, P.; Wu, H.; Iwahori, Y.; Liu, Y. Improved YOLOv8 for Dangerous Goods Detection in X-Ray Security Images. Electronics 2024, 13, 3238. [Google Scholar] [CrossRef]
Kumar, D.; Muhammad, N. Object Detection in Adverse Weather for Autonomous Driving through Data Merging and YOLOv8. Sensors 2023, 23, 8471. [Google Scholar] [CrossRef]
Sharma, P.; Saurav, S.; Singh, S. Object Detection in Power Line Infrastructure: A Review of the Challenges and Solutions. Eng. Appl. Artif. Intell. 2024, 130, 107781. [Google Scholar] [CrossRef]
Li, D.; Han, T.; Zhou, H.; Chai, H. Lightweight Siamese Network for Visual Tracking via FasterNet and Feature Adaptive Fusion. In Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 22–24 March 2024; pp. 1–5. [Google Scholar]
Xia, C.; Ren, M.; Wang, B.; Dong, M.; Xu, G.; Xie, J.; Zhang, C. Infrared Thermography-Based Diagnostics on Power Equipment: State-of-the-Art. High Volt. 2021, 6, 387–407. [Google Scholar] [CrossRef]
Shi, Z.; Zhao, Q.; Su, L.; Su, Y.; Yan, N. Equipment Anomaly Detection in Power Grids Using Deep Learning. In Proceedings of the 2021 International Conference on Intelligent Computing, Automation and Systems (ICICAS), Chongqing, China, 29–31 December 2021; pp. 271–275. [Google Scholar]
Hoang, V.-T.; Jo, K.-H. Practical Analysis on Architecture of EfficientNet. In Proceedings of the 2021 14th International Conference on Human System Interaction (HSI), Gdansk, Poland, 8–10 July 2021; pp. 1–4. [Google Scholar]
Chiu, Y.-C.; Tsai, C.-Y.; Ruan, M.-D.; Shen, G.-Y.; Lee, T.-T. Mobilenet-SSDv2: An Improved Object Detection Model for Embedded Systems. In Proceedings of the 2020 International Conference on System Science and Engineering (ICSSE), Kagawa, Japan, 31 August–3 September 2020; pp. 1–5. [Google Scholar]
Sun, J.; Ge, H.; Zhang, Z. AS-YOLO: An Improved YOLOv4 Based on Attention Mechanism and SqueezeNet for Person Detection. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing China, 12–14 March 2021; Volume 5, pp. 1451–1456. [Google Scholar]
Lee, Y.; Hwang, J.; Lee, S.; Bae, Y.; Park, J. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 752–760. [Google Scholar]
Wang, C.-Y.; Mark Liao, H.-Y.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Roboflow: Computer Vision Tools for Developers and Enterprises. Available online: https://roboflow.com/ (accessed on 3 November 2024).
Model & API. Available online: https://universe.roboflow.com/shanghai-jiao-tong-university-xwhvl/infraredimage3/model/6 (accessed on 23 December 2024).
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Liu, H.; Zhang, Y.; Liu, S.; Zhao, M.; Sun, L. UAV Wheat Rust Detection Based on FasterNet-YOLOv8. In Proceedings of the 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO), Samui, Thailand, 4–9 December 2023; pp. 1–6. [Google Scholar]
Li, Z.; Wang, J. Multi-Person Gesture Recognition for Complex Environmental Based on Improved Yolov8 Algorithm. In Proceedings of the 2023 5th International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, China, 8–10 December 2023; pp. 614–619. [Google Scholar]
Hu, T.; Zhuang, D.; Qiu, J.; Zheng, L. Improved YOLOv8 Algorithm with C2f-DCNv3 and Shuffle Attention for Detection of Coal Shearer Drum Teeth. In Proceedings of the 2024 4th International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 19–21 January 2024; pp. 1019–1022. [Google Scholar]
Zhang, Y.; Wu, Z.; Wang, X.; Fu, W.; Ma, J.; Wang, G. Improved YOLOv8 Insulator Fault Detection Algorithm Based on BiFormer. In Proceedings of the 2023 IEEE 5th International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 14–16 July 2023; pp. 962–965. [Google Scholar]

Figure 1. YOLOv8 network architecture diagram.

Figure 2. Structural diagram of the C2f-fast.

Figure 3. Structural diagram of the FasterNet block.

Figure 4. Structural diagram of the GSConv.

Figure 5. Diagram illustrating the structure of VoVGSCSP.

Figure 6. Structural diagram of the EMA.

Figure 7. Diagram of the feature fusion structure with the addition of a small target detection layer.

Figure 8. YOLOv8_Adv network structure diagram.

Figure 9. Training dataset.

Figure 10. Confusion matrix.

Figure 11. Comparative analysis of precision–recall (P–R) curves for different models.

Figure 12. mAP curve comparison of various models.

Figure 13. Visualized detection results (a) YOLOv8n; (b) YOLOv8_Adv.

Table 1. Comparative experiments of YOLOv8_Adv to other models.

Models	Pixels	P/%	R/%	[email protected]/%	GFLOPs/G
YOLOv8n [37]	640 × 640	92.3	91.4	94.8	12.4
YOLOv8_FasterNet [38]	640 × 640	94.5	93.4	95.3	9.2
YOLOv8_Gsconv [39]	640 × 640	96.6	94.0	97.9	11.6
YOLOv8_fastC2f [40]	640 × 640	97.0	93.5	96.2	11.1
YOLOv8_Biformer [41]	640 × 640	94.4	96.7	98.4	12.7
YOLOv8_Adv	640 × 640	97.3	96.6	98.7	10.1

Table 2. Ablation experiments of the YOLOv8_Adv model.

Models	C2f-fast	GSconv	VoVGSCSP	EMA	P/%	R/%	mAP@50/%	GFLOPs/G
YOLOv8n	×	×	×	×	92.3	91.4	94.8	12.4
YOLOv8n-C	√	×	×	×	95.2	94.0	96.8	11.3
YOLOv8n-CG	√	√	×	×	96.3	95.7	97.4	10.7
YOLOv8n-CGV	√	√	√	×	96.8	96.0	98.1	10.1
YOLOv8_Adv	√	√	√	√	97.3	96.6	98.7	10.1

Table 3. Comparison of computational efficiency across different models.

Models	Layers	Parameters	FPS	GFLOPs/G
YOLOv8n	207	2,921,964	175.4	12.4
YOLOv8_FasterNet	218	1,717,396	137.0	9.2
YOLOv8_Gsconv	231	2,638,140	192.3	11.6
YOLOv8_fastC2f	225	2,560,764	181.8	11.1
YOLOv8_Biformer	215	2,932,140	172.4	12.7
YOLOv8_Adv	344	2,341,644	163.9	10.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, H.; Paul, A.; Wu, Z. Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8. Appl. Sci. 2025, 15, 328. https://doi.org/10.3390/app15010328

AMA Style

Tao H, Paul A, Wu Z. Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8. Applied Sciences. 2025; 15(1):328. https://doi.org/10.3390/app15010328

Chicago/Turabian Style

Tao, Haotian, Agyemang Paul, and Zhefu Wu. 2025. "Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8" Applied Sciences 15, no. 1: 328. https://doi.org/10.3390/app15010328

APA Style

Tao, H., Paul, A., & Wu, Z. (2025). Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8. Applied Sciences, 15(1), 328. https://doi.org/10.3390/app15010328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Image Detection and Recognition of Substation Electrical Equipment Based on Improved YOLOv8

Abstract

1. Introduction

2. YOLOv8 Framework

3. YOLOv8_Adv Model

3.1. Replacing the C2f Module in the Backbone

3.2. Importation of the GSConv and VoVGSCSP Modules in Neck Section

3.3. Integrating the EMA Attention Mechanism

3.4. Integration of Small Target Detection Layer in Head Section

3.5. The Enhanced YOLOv8_Adv Model

4. Experimental Results and Discussion

4.1. Dataset Description

4.2. Experimental Setup and Comparative Metrics

4.3. Experimental Results and Comparative Evaluation

4.3.1. Performance of Device Identification

4.3.2. Comparison of Different Testing Methods

4.3.3. Ablation Experiment

4.3.4. Visual Comparison of Recognition Effects

4.3.5. Computational Efficiency Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI