The emphasis on producing greener energy is a key pillar of the environmental policy of many developed and developing countries around the world. As a result, grid-connected photovoltaic (GCPV) systems have seen a considerable rise in popularity over the past decade. The wide-scale implementation of PV systems has exposed areas for improvement such as methods for fault detection [
1]. An important economic factor is system maintenance costs. Conventional approaches utilize manual inspections by personnel, which is expensive and can introduce human error.
The use of artificial intelligence (AI) for solving complex problems has encouraged a wide range of industries to adopt AI-based algorithms for their needs. This trend can also be seen in the field of PV fault detection.
PV fault detection can be divided into three key categories: electrical; thermal; and visual [
1]. This work focuses on a sub-category found within the electrical approach: investigating the deployment of AI algorithms on key electrical parameters obtained from a PV system for fault detection classification [
2]. A deeper look into the electrical category brings to light further divisions into sub-categories, namely:
1.1. Literature Review
Much of the research literature focuses on reducing the number of input features and the complexity of machine learning-based algorithms for PV fault detection. This requires practitioners to evaluate and select the most important input variables for the network to achieve optimal accuracy. Defining a set of relevant features makes data collection and pre-processing more effective for the practitioner.
Millit et al. [
6] in their work demonstrated the implementation of artificial neural networks (ANNs) for the modelling and estimation of output power from GCPV installations. The research was based on measurements for one year (1 January 2011 to 24 February 2012) from a PV system located at Marmara University, Turkey. The parameters used for the model training were solar irradiance, voltage, current, and temperature. Dhimish [
7] proposed an ANN-based approach for energy harvesting and failure mode prediction in a PV installation to aid dynamic maintenance tasks. The model inputs included environmental features such as external temperature and parameters of the module such as internal temperature and operating times.
Furthermore, Moreno et al. [
8] presented an ANN-based algorithm for forecasting global solar irradiance (GHI). The focus of the ANN model was to predict local GHI for four neighboring locations from weather data provided by the US National Oceanic and Atmospheric Administration (NOAA). This research focused on forecasting a decisive parameter required for any further forecasting or even fault detection in PV systems. Chine et al. [
9] proposed dual algorithms (MLP and RBF) for fault detection in the DC part of a PV system. Based on the solar irradiance and temperature of the PV modules, several features were created to be used as inputs for the model, including the PV current, voltage, and the number of peaks in the current. The results obtained from both models, evaluated through confusion matrices, showed that the multi-layer perception (MLP) model had an accuracy of 90.3% compared with that of the radial basis function (RBF) model, which was 68.4%. Notably, the dataset used for testing was relatively small (775 samples) and was also obtained through a simulation from MATLAB. This may have led to the model not presenting key relationships that may have otherwise been evident in a real-world dataset. The two algorithms are distinctively different; MLP can accommodate many hidden layers within its infrastructure, but demands more computational power. Conversely, RBF is a single layer network and it can be more effective, depending on the application.
In contrast, Muhammad et al. [
10] demonstrated an RBF-based algorithm for PV fault detection; its accuracy was within 96.5–98.1%. The model required only two inputs, solar irradiance and PV output power. Testing the algorithm on a dataset obtained from a live installation provided further confidence in the accuracy of the model. Hussain and Chen [
11] proposed a gradient-guided convolutional neural network architecture for enhanced Micro-crack fault detection in PV systems. The proposed model used the analysis of the PV cell surface demonstrating that the proposed algorithm could an F1 score for Micro-crack detection of 98.8%.
Three clear boundaries exist from which a specific type of methodology can be derived based on the type of application. These are supervised, unsupervised, and reinforcement learning [
12]. The content of the data, coupled with the end goal, assists researchers with the adaption of one of the above techniques [
13]. There are cases where hybrid architecture can be deployed to achieve the end goal; a commonly known approach is semi-supervised learning. In their paper, Yao et al. [
14] highlighted the drawbacks of purely supervised learning, especially for sensor data, as the process of data annotation can become cumbersome. They presented a semi-supervised ML model based on probabilistic modelling for PV condition monitoring.
Ye et al. [
15] presented a graph-based semi-supervised algorithm for fault detection and the classification of PV systems. The authors highlighted the non-linear characteristics in the PV systems, suggesting that ML was a suitable approach for this application. Furthermore, they justified a semi-supervised model rather than the conventional approach: first, due to the lack of availability of actual PV data that were labelled; and second, due to the difficulties faced whilst trying to update the deployed models. The proposed algorithm could detect PV faults and apply fault labelling to expedite a system recovery. A key highlight of the proposed system was its ability to learn over time as the weather changed.
Another work was presented by Giovanni et al. [
16], who developed a neural network-based solution to detect system faults or cyber attacks within a PV system connected to the grid. The proposed model was based on anomaly detection, extracting the critical PV parameters of the system before being processed through an auto-encoder-based neural network for behavioral classification. A similar work was also presented by Sunme et al. [
17]; they developed an imputation and fault detection model for a fleet of small-scale PV systems. K-means was implemented to cluster the neighboring data along with unlabeled data and to detect abnormal data points. The work was based on actual roof-top PV data. The clustering results provided error rates of 12.6% (with neighboring data) and 22.3% (without neighboring PV data).
Imtiaz et al. [
18] proposed the use of support vector machines (SVM) for online power quality disturbance detection in smart meters. Their methodology included the segregation of a power disturbance from regular readings using a one-class support vector machine (OCSVM) [
19]. To accurately detect the power disturbances of a voltage wave, practical wavelet filters were applied. Due to the unlimited types of waveform abnormalities, the OCSVM was selected as a semi-supervised machine learning algorithm that required training on a relatively large sample of standard data. Their model autonomously detected various types of disturbances in real-time, including unknown disturbances that were not catered for in the training dataset.
1.2. Paper Contribution and Organization
Summarizing the above literature, we observed that the selection of a specific algorithm by PV developers was arbitrary rather than based on a methodological framework. As a result, the performance of the developed classifier may be impressive; however, this does not make it computationally the most effective. In this paper, we did not arbitrarily select and train machine learning models for PV fault detection. In fact, this paper contributes to the research in the field of PV fault detection through AI by presenting a distinctive comparison between the conventional statistical approaches against emerging ML algorithms. By doing so, this paper encourages researchers to deploy a ‘bottom-up’ approach for selecting the correct backbone architecture for PV fault detection. By presenting the results of both statistical and ML models, the reader can appreciate the importance of choosing the optimal methodology and the impact this has on factors such as computational demand, latency, and outlier handling.
Based on the premise above, we present a hybrid approach of statistics and ML for tackling PV fault detection. It is also important to note that within ML itself, we used an ensemble approach by combining unsupervised and supervised learning for data pre-processing, model training, testing, and post-deployment validation. This allowed us to address the real-world constraints of acquiring data such as missing data points, human errors, and outliers.
The paper is organized as follows:
Section 2 presents our proposed methodology, featuring an in-depth analysis of statistical processing, clustering, and ML-based modelling. In
Section 3, we present our model results for the initial dataset and examine the accuracy of the ML models using another PV system that included a noisy and missing dataset. Finally,
Section 4 presents the critical outcomes/conclusions of the proposed work.