A Deep Reinforcement Learning Optimization Method Considering Network Node Failures

Ding, Xueying; Liao, Xiao; Cui, Wei; Meng, Xiangliang; Liu, Ruosong; Ye, Qingshan; Li, Donghe

doi:10.3390/en17174471

Open AccessArticle

A Deep Reinforcement Learning Optimization Method Considering Network Node Failures

by

Xueying Ding

¹,

Xiao Liao

¹,

Wei Cui

¹,

Xiangliang Meng

¹,

Ruosong Liu

^2,*,

Qingshan Ye

² and

Donghe Li

²

¹

State Grid Information and Telecommunication Group Co., Ltd., Beijing 100029, China

²

School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(17), 4471; https://doi.org/10.3390/en17174471

Submission received: 10 August 2024 / Revised: 20 August 2024 / Accepted: 26 August 2024 / Published: 6 September 2024

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Smart Grids)

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, the microgrid system is characterized by a diversification of power factors and a complex network structure. Existing studies on microgrid fault diagnosis and troubleshooting mostly focus on the fault detection and operation optimization of a single power device. However, for increasingly complex microgrid systems, it becomes increasingly challenging to effectively contain faults within a specific spatiotemporal range. This can lead to the spread of power faults, posing great harm to the safety of the microgrid. The topology optimization of the microgrid based on deep reinforcement learning proposed in this paper starts from the overall power grid and aims to minimize the overall failure rate of the microgrid by optimizing the topology of the power grid. This approach can limit internal faults within a small range, greatly improving the safety and reliability of microgrid operation. The method proposed in this paper can optimize the network topology for the single node fault and multi-node fault, reducing the influence range of the node fault by 21% and 58%, respectively.

Keywords:

microgrid; topology; deep reinforcement; electric power safety

1. Introduction

The modernization and intelligent transformation of my country’s power system is advancing rapidly, especially at the city and community levels. The rise of microgrids [1,2,3,4,5] has become an important part of this process. Microgrids refer to small, self-managed, and controlled power systems that can be connected to the main power grid or operate independently. With the large-scale access to renewable energy such as solar and wind power, and the popularization of new power loads such as electric vehicle charging stations and energy storage systems, the structure of microgrids has become extremely complex, which not only increases the flexibility and sustainability of the power system, but also brings new challenges in grid fault management.

There are various types of faults [6,7,8,9] in microgrids, including but not limited to short circuits, overloads, harmonic interference, voltage fluctuations, etc. These faults may be caused by factors such as aging equipment, bad weather, human error, etc. Since distributed energy and loads are integrated in microgrids, the impact of faults is no longer limited to local areas in the traditional sense but may quickly spread to the entire microgrid and even the main grid connected to it, resulting in power outages and reduced service quality in a wider range.

Traditional grid fault detection and troubleshooting relies on manual inspections and automated systems based on fixed rules. This approach has many drawbacks when facing complex and changeable microgrid faults. First, manual inspections are time-consuming, and it is difficult to respond to sudden faults in a timely manner; second, automated systems based on fixed rules often lack intelligent judgment capabilities and cannot effectively handle unexpected faults; finally, when a fault occurs, there is no effective isolation mechanism, which expands the scope of the fault impact and prolongs the recovery time.

The topology of a power grid refers to the way in which the various components in a power network (such as generators, transformers, transmission lines, circuit breakers, and load points) are connected. Grid topologies can be divided into different levels and types, from macro, national, or regional power grids, to meso-transmission networks and micro-distribution networks, and each layer has its own specific structural characteristics. The distribution network is responsible for distributing electricity from substations to end users. Common distribution network topologies include radial, ring, and tree types. The radial structure is simple and low in cost, but a single point failure may affect the power supply over a larger area; the ring and tree structures improve the reliability of the power supply, and even if a certain section of the line fails, the power supply can be restored through a bypass. Figure 1 shows the simple topology of a new power system microgrid.

In view of the above challenges, this paper proposes a deep reinforcement learning optimization method considering network node failures (DRL-NNF), which can quickly respond, accurately isolate faults, and suppress fault propagation as a grid security protection method.

The field of power grid fault safety detection involves a variety of cutting-edge methods and theories. Refs. [10,11,12,13,14,15] studied the fault types and locations within the power grid and proposed a fault detection and diagnosis technology based on artificial intelligence algorithms, laying the foundation for the optimization and adjustment of subsequent faults. Ref. [16] considers the influence of large-scale distributed photovoltaic power generation on the regional grid line loss, establishes a grid topology model with the minimum line loss as the goal, and jointly optimizes the grid topology by combining the k-means algorithm and KPSO algorithm to improve resource utilization. Ref. [17] proposes a hybrid topological crossover algorithm (HTCSO) to solve the problems of the slow convergence and local optimality of traditional crossover algorithms. It uses a hybrid topology combining a ring regular topology and random topology to improve the convergence accuracy and convergence speed of the CSO algorithm. Ref. [18] puts forward a topology planning framework, which involves a power communication network based on game theory from the perspective of network attacks, and uses Bayesian decision theory and the particle swarm optimization algorithm to solve the problem of grid anti-communication attacks step by step. However, for large-scale power grid scenarios, these methods require a large amount of data and computing, which will bring challenges to hardware equipment and data real-time performance.

2. System Architecture

This paper focuses on the optimization of the topology of the entire power grid, abandoning the traditional mode of local adjustment for a single device and focusing on a solution from a global perspective. On the constructed microgrid experimental platform, detailed electrical characteristic parameterization is performed for each branch, covering key indicators such as resistance, reactance, and rated current to ensure the authenticity and reliability of the model. Subsequently, sudden faults are simulated at selected nodes to trigger a system response and verify the adaptability and robustness of this method in the face of power grid disturbances.

The proposed solution relies on deep reinforcement learning (DQN), a cutting-edge technology that combines the advantages of deep learning and learning reinforcement learning algorithms [19,20,21] and is particularly suitable for handling decision-making problems in high-dimensional and complex state spaces. The DQN is well-suited for problems with discrete action spaces. For power grid dispatching problems, especially those involving the switching operation of generator sets and line switching, these are usually discrete. Therefore, the DQN shows good adaptability in dealing with such problems. Compared with algorithms that deal with continuous action spaces (such as Actor–Critic methods, Policy Gradients, etc.), the implementation of the DQN is relatively simple and intuitive. It is based on the Q-learning framework and trains a deep neural network to approximate the Q-value function to select the best action. This simplicity facilitates rapid prototyping and testing.

In this study, the state changes of line switches in each branch of the power grid are regarded as key decision variables, and the load fluctuations caused by the switching operation are regarded as the dynamic feedback of the system state. This process is naturally mapped to the Markov decision process (MDP) [22,23,24] framework, in which state, action, reward, and transition probability constitute the four major elements of the MDP.

The network topology optimization method based on deep reinforcement learning proposed in this paper takes the open and closed state (SS) of each branch switch in the power grid and the operating state (SN) of each node as the state space:

\begin{matrix} S S = {S S_{1}, S S_{2}, S S_{3}, \dots S S_{N_{a l l - s w i t c h}}} \\ S S_{i} = {\begin{matrix} 1 when switch i is in the closed state \\ 0 when switch i is in the open state \end{matrix} \end{matrix}

\begin{matrix} S N = {S N_{1}, S N_{2}, S N_{3}, \dots S N_{N_{a l l - n o d e}}} \\ S N_{i} = {\begin{matrix} 1 when node i is in the normal state \\ 0 when node i is in the faulty state \end{matrix} \end{matrix}

So, the state space is represented as

S = {\begin{cases} S S = {S S_{1}, S S_{2}, S S_{3}, \dots S S_{N_{a l l - s w i t c h}}} \\ S N = {S N_{1}, S N_{2}, S N_{3}, \dots S N_{N_{a l l - n o d e}}} \end{cases}

where

N_{a l l - s w i t c h}

is the total number of branch switches and

N_{a l l - n o d e}

is the total number of microgrid internal nodes.

The action space refers to the range of operations that can be performed by the agent. In the optimization problem of the microgrid topology proposed in this paper, the agent changes the topology of the grid by controlling the on–off switch of each branch. Therefore, the action space of the agent in this paper is as follows:

A = {a_{0}, a_{1}, \dots, a_{N_{a l l - n o d e}}},

a_{i} = {\begin{matrix} 1 & T u r n s w i t c h i o n \\ 0 & T u r n s w i t c h i o f f \end{matrix}

In reinforcement learning, “rewards” often represent the quality of the “actions” taken by the agent. Therefore, “rewards” are often related to the indicators that the system optimizes. Taking the failure rate of microgrid internal nodes

ρ

as the optimization target,

N_{a l l - n o d e}

is the total number of microgrid internal nodes, and

N_{f a i l u r e - n o d e}

is the total number of microgrid internal fault nodes:

ρ = \frac{N_{a l l - n o d e} - N_{f a i l u r e - n o d e}}{N_{a l l - n o d e}},

(1)

The framework of the network topology optimization method based on deep reinforcement learning proposed in this paper is shown in Figure 2. After the microgrid topology structure is modeled, the state (SS, SN) is input into the DQN to obtain the optimal action in this state (the action is to reverse the switch state of some branches inside the power grid), and then the load distribution and internal node operation state in the new state of the microgrid are calculated to obtain the reward of this step; finally, the parameters of the deep Q-network are updated.

The DQN algorithm is used to iteratively train the MDP in order to learn a policy function that can predict the maximum long-term reward. In each iteration, the DQN model intelligently selects the most favorable switching operation sequence based on the current grid status and past experience to minimize the impact of the fault, restore the power supply, and optimize the grid performance. This process is not only a dynamic reconfiguration of the grid topology, but also a comprehensive improvement of system stability and efficiency.

3. Model

This section aims to model the actual environment of the power grid and simulate the operation of a community-level microgrid based on the IEEE-33 node system. Subsequently, a reinforcement learning environment will be constructed for the grid node system, and a deep reinforcement learning model will be employed to optimize the grid topology in case of a faulty node within the grid.

3.1. IEEE-33 Node System Construction

The IEEE-33 [25] node system is a model widely used in power system research. It simulates a distribution network with 33 nodes and 32 transmission lines. This system is often used as a benchmark to evaluate and compare the performance of various power system analysis methods and algorithms. In the IEEE-33 node system, each node represents a power demand point or supply point, and each transmission line represents the connection between nodes. The system parameters include the voltage, current, power factor, load, etc., of the node. In addition, electrical characteristic parameters such as the impedance and admittance of the line are also included. The main purpose of the IEEE-33 node system is to conduct various analyses and studies of the power system, such as power flow calculation, fault analysis, optimization control, etc. Through the study of this system, engineers can better understand and master the operating laws of the power system and provide a theoretical basis and technical support for the actual design and operation of the power system.

This paper uses an improved IEEE-33 node system to build the environment. On the basis of traditional power factors, photovoltaic power generation equipment is added to simulate the operation of modern new power systems. Five tie lines are added on the basis of the classic IEEE-33 node system. The tie lines are backup lines and are usually disconnected. They are mainly used for topological adjustments during microgrid fault maintenance and peak power periods. The grid topology structure built in this paper is shown in Figure 3.

3.2. DQN Model

The DQN (deep Q-network) model [26,27,28,29] is an algorithm that combines deep learning and reinforcement learning. It was proposed by DeepMind to solve the Markov decision process problem in a discrete action space. The DQN combines deep neural networks with Q-learning algorithms and applies deep learning to reinforcement learning tasks, greatly expanding the application scope and performance of reinforcement learning.

3.2.1. Fundamental

The core idea of the DQN is to use deep neural networks to approximate the action–value function

Q (s, a)

in Q-learning, where

s

represents the current state and

a

represents the action taken in states.

Q (s, a)

represents the expected benefit of taking action,

a

, in states. Traditional Q-learning algorithms maintain a Q-table to record the Q-value of each state–action pair, but this method becomes infeasible when the state space is large or continuous. The DQN estimates these Q-values through neural networks, so that it can handle high-dimensional state spaces.

Therefore, the DQN inherits the value update criterion in the Q-learning method, as shown in Formula (2), which updates the action–value function

Q (s, a)

in the form of a difference.

Q (s_{t}, a_{t}) = Q (s_{t}, a_{t}) + α [R_{t} + γ \max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})],

(2)

Unlike the Q-learning method, the original Q-learning algorithm uses limited data, and each experience sample is only used to update the value function once, which limits the efficiency and effect of learning to a certain extent. In order to bridge the gap between Q-learning and deep neural networks, the DQN algorithm introduces the “experience replay” mechanism.

The core idea of the experience replay mechanism is to set up a memory bank, specifically for storing the experiences generated during the interaction between the agent and the environment, that is, the four-tuple data consisting of “state, action, reward, and subsequent state”. Every time the agent takes an action in the environment, the relevant experience samples are stored in this memory bank. In the subsequent Q-network training phase, the algorithm no longer directly relies on the latest environmental feedback, but randomly extracts a batch of samples from the memory bank as training input. The subtlety of this design lies in two points:

Breaking sequence dependency: by disrupting the order of samples, the experience replay mechanism helps to reduce the temporal correlation of training data and avoids the gradient estimation offset caused by a strong correlation between consecutive samples, thereby improving the generalization ability of the model.
Data reuse: the reuse of experience samples not only improves the utilization rate of data, but also enhances the adaptability of the model to environmental dynamics, especially in scenarios where data are scarce or the acquisition cost is high.

3.2.2. Process

The innovative deep reinforcement learning method proposed in this paper aims to optimize the network topology of microgrids, especially in the face of node failure challenges. The core of this strategy is to use the intelligent decision-making ability of the DQN to achieve the dynamic adjustment of the microgrid topology to enhance the stability and efficiency of the system. The specific process is shown in Figure 4.

First, the microgrid is initialized to establish its initial architecture and parameter settings, including the status of each node and the connection relationship between them. At the same time, the DQN model is initialized, including the network architecture initialization and model parameter initialization.

Next, we enter the model training phase. In order to simulate real fault conditions, some nodes are randomly set to fail in the simulated environment of the microgrid as training samples. The DQN model receives the current state information of the microgrid, including the location of the faulty node and the overall topology of the microgrid, and then makes decisions and executes the most appropriate actions based on this information, such as reconfiguring network connections or starting backup lines to achieve the goal of optimizing the microgrid structure. After each action is executed, the microgrid enters a new state and calculates a reward for the new state. The reward is the current microgrid node failure rate

ρ

. Then, the model is reversely updated according to the current reward, and the optimal model is obtained after multiple iterations.

After the model training is completed, it is deployed in the actual microgrid environment as an intelligent tool for real-time monitoring and response to faults. When the microgrid encounters a node failure, the system immediately inputs the fault status and current topology data into the trained DQN model. The model quickly analyzes this information and outputs an optimal topology state adjustment strategy to guide the microgrid on how to adjust its network structure immediately and effectively to minimize the impact of the fault and restore the stability and reliability of the power supply.

4. Experimental Settings

This section will conduct an experimental verification on several node failures that occur frequently in microgrids to verify the effectiveness of the method proposed in this paper and set up a control experiment. The reference group uses the particle swarm optimization algorithm, but not the optimization algorithm. The particle swarm optimization algorithm is a heuristic global optimization method inspired by the social modeling of bird flock flight behavior. It aims to solve complex optimization problems by simulating the process of birds looking for food.

4.1. Single Node Failure in Microgrid

A single node failure refers to a situation where a key device or node in the microgrid fails, including the following aspects: power failure, sudden shutdown of new energy, or failure of energy storage equipment, which will cause a power imbalance in the microgrid and affect the stability of the power supply. A load failure or abnormally large current may cause an overload or short circuit, thus affecting the safe operation of the entire microgrid. Line failures, including short circuits, open circuits or insulation damage caused by line aging; these failures will affect power transmission and even cause safety problems such as fire. Control equipment failures, such as the failure of protective equipment such as circuit breakers and relays, may result in the inability to isolate the fault in time, further expanding the scope of impact.

In the experimental verification of this scenario, a single node inside the microgrid is selected, and the fault state is set within a set period of time. The experiment verifies the microgrid topology optimization effect of the algorithm proposed in this article and compares it with the control group. The experimental settings are shown in the Table 1.

Node 10 is a PV, which is a node with a high incidence of failure in daily microgrid operation. Taking hours as the minimum unit, it is set to fail in 1–3 h and the maintenance time is 2 h. As can be seen from the Figure 5, when the PV device at node 10 fails, if there is no change in the microgrid topology, all loads at nodes 11–18 will be in a power-off state, causing great damage to the user’s experience and production and life.

The network topology is optimized for the single node failure mentioned above. As mentioned above, the topology structure is adjusted in the optimization process with the goal of reducing the internal node failure rate of the microgrid

ρ

. In this process, the power grid must not be looped, because in a ring-configured microgrid, unnecessary current circulation does lead to unnecessary energy consumption and may cause equipment overload risks, posing a threat to the reliability of the system. At the same time, fault detection and isolation under the ring structure are more complicated than linear or radial networks, requiring more sophisticated protection strategies and more advanced control technologies, which increase the complexity and cost of the system.

After model processing, the proposed method and the particle swarm optimization algorithm of the control group obtain the same optimization results, as shown in Figure 6. Under a single node failure, the changes in the internal fault nodes of the microgrid are shown in Figure 7. The impact range of the node failure is reduced by 21% compared with the case without optimization.

The changes in network topology are shown in Table 2.

4.2. Multi-Node Failure in Microgrid

Microgrid multi-node failure refers to a fault condition that occurs simultaneously or successively at two or more nodes in a microgrid system. These faults may include but are not limited to an overload, short circuit, open circuit, or power failure, and they may occur in multiple locations such as power generation units, energy storage systems, load ends, or connecting lines. Multi-node failures pose a major challenge to the stable operation of microgrids, as they may cause power outages in some microgrids or even the entire system, affect power quality, increase the complexity of fault diagnosis and recovery, and may trigger chain reactions, further expanding the scope of the fault. Therefore, for the design and management of microgrids, special attention must be given to the prevention and rapid response capabilities of multi-node failures to ensure the reliability and resilience of the system.

In the experimental verification of this scenario, several nodes inside the microgrid are selected, and the fault state is set within the set period. The experiment verifies the microgrid topology optimization effect of the algorithm proposed in this paper and compares it with the control group. The experimental settings are shown in Table 3.

Taking hours as the minimum unit, it is assumed that the failure occurs in 1–4 h and the maintenance time is 3 h. As shown in Figure 8, when a fault occurs inside the microgrid, if there is no change in the microgrid topology, the microgrid will cause more than two-thirds of the fault area.

The network topology is optimized for the above multi-node failures. Under the premise of avoiding a closed-loop operation, the topology structure is changed to keep the internal failure rate of the microgrid at the lowest rate. After the optimization and adjustment, the new microgrid topology is obtained, as shown in Figure 9.

Under multi-node failures, the changes of faulty nodes inside the microgrid are shown in Figure 10. The impact range of node failures is reduced by 58% compared with the case without optimization.

The changes in network topology are shown in Table 4.

5. Conclusions

The deep reinforcement learning network topology optimization method (DRL-NNF) proposed in this paper can effectively control grid faults from the microgrid topology level by building models, training models, and deploying models for microgrids, which can greatly improve the fault defense capability of microgrids. The model proposed in this paper has the following three contributions:

From the research method and entry point, the model comprehensively considers the topology of the microgrid, rather than focusing only on a single component or local problem, thereby achieving comprehensive security optimization and ensuring that the various power nodes within the microgrid work together to pursue system-level security maximization.
In terms of the ability to cope with the complex power grid environment, this method has made a breakthrough. Using the powerful ability of deep reinforcement learning, DRL-NNF can cope with high-dimensional state spaces and has advantages in dealing with modern microgrids with complex structures and dynamic characteristics. DRL-NNF uses the powerful characterization capability of deep networks to avoid the complex problems of traditional methods in the face of complex topologies.
In terms of the universality of the method, the scope of application of DRL-NNF is not limited to fault control. This method is also applicable to other key areas in the operation of microgrids, such as load distribution, line loss reduction, and energy efficiency improvement. Through the migration and application of models, it is possible to flexibly switch between different optimization objectives without training new models from scratch, which greatly saves time and computing resources, and also enhances the flexibility and efficiency of microgrids in multi-faceted performance optimization.

Author Contributions

Conceptualization, X.D. and X.L.; methodology, X.D.; software, W.C.; validation, X.D., X.L. and W.C.; formal analysis, R.L. and D.L.; investigation, X.L.; resources, W.C.; data curation, R.L.; writing—original draft preparation, Q.Y., X.M. and D.L.; writing—review and editing, X.L., X.M. and D.L.; visualization, Q.Y.; supervision, X.D.; project administration, X.D.; funding acquisition, X.D. and D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by State grid information and communication Industry Group Co., Ltd. Science and technology innovation project, grant number SGIT0000XMJS2310456.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

Thanks are extended to the State Grid Information and Communication Industry Group Co., Ltd., as well as the Department of Electronics and Information Science of Xi’an Jiaotong University, for their financial and technical support to this article.

Conflicts of Interest

Mrs. Xueying Ding, Mr. Xiao Liao, Mr. Wei Cui, and Mr. Xiangliang Meng are from State Grid Information and Telecommunication Group Co., Ltd. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Taouil, K.; Aloulou, R.; Bradai, S.; Gassara, A.; Kharrat, M.W.; Louati, B.; Giordani, M. P2P Energy Exchange Architecture for Swarm Electrification-Driven PV Communities. Energies 2024, 17, 3680. [Google Scholar] [CrossRef]
Khazali, A.; Al-Wreikat, Y.; Fraser, E.J.; Sharkh, S.M.; Cruden, A.J.; Naderi, M.; Smith, M.J.; Palmer, D.; Gladwin, D.T.; Foster, M.P.; et al. Planning a Hybrid Battery Energy Storage System for Supplying Electric Vehicle Charging Station Microgrids. Energies 2024, 17, 3631. [Google Scholar] [CrossRef]
Fresia, M.; Robbiano, T.; Caliano, M.; Delfino, F.; Bracco, S. Optimal Operation of an Industrial Microgrid within a Renewable Energy Community: A Case Study of a Greentech Company. Energies 2024, 17, 3567. [Google Scholar] [CrossRef]
Naseri, N.; Aboudrar, I.; El Hani, S.; Ait-Ahmed, N.; Motahhir, S.; Machmoum, M. Energy Transition and Resilient Control for Enhancing Power Availability in Microgrids Based on North African Countries: A Review. Appl. Sci. 2024, 14, 6121. [Google Scholar] [CrossRef]
Abeg, A.I.; Islam, M.R.; Hossain, M.A.; Ishraque, M.F.; Islam, M.R.; Hossain, M.J. Capacity and operation optimization of hybrid microgrid for economic zone using a novel meta-heuristic algorithm. J. Energy Storage 2024, 94, 112314. [Google Scholar] [CrossRef]
Aiswarya, R.; Nair, D.S.; Rajeev, T.; Vinod, V. A novel SVM based adaptive scheme for accurate fault identification in microgrid. Electr. Power Syst. Res. 2023, 221, 109439. [Google Scholar]
Jalli, R.K.; Mishra, S.P.; Dash, P.K.; Naik, J. Fault analysis of photovoltaic based DC microgrid using deep learning randomized neural network. Appl. Soft Comput. J. 2022, 126, 109314. [Google Scholar] [CrossRef]
Sistani, A.; Hosseini, S.A.; Sadeghi, V.S.; Taheri, B. Fault Detection in a Single-Bus DC Microgrid Connected to EV/PV Systems and Hybrid Energy Storage Using the DMD-IF Method. Sustainability 2023, 15, 16269. [Google Scholar] [CrossRef]
Biswal, C.; Sahu, B.K.; Mishra, M.; Rout, P.K. Real-Time Grid Monitoring and Protection: A Comprehensive Survey on the Advantages of Phasor Measurement Units. Energies 2023, 16, 4054. [Google Scholar] [CrossRef]
Najafzadeh, M.; Pouladi, J.; Daghigh, A.; Beiza, J.; Abedinzade, T. Fault Detection, Classification and Localization Along the Power Grid Line Using Optimized Machine Learning Algorithms. Int. J. Comput. Intell. Syst. 2024, 17, 49. [Google Scholar] [CrossRef]
Taifeng, C.; Chunbo, L. Soft computing based smart grid fault detection using computerised data analysis with fuzzy machine learning model. Sustain. Comput. Inform. Syst. 2024, 41, 100945. [Google Scholar]
Wu, Y.; Xiao, F.; Liu, F.; Sun, Y.; Deng, X.; Lin, L.; Zhu, C. A Visual Fault Detection Algorithm of Substation Equipment Based on Improved YOLOv5. Appl. Sci. 2023, 13, 11785. [Google Scholar] [CrossRef]
Fu, L.; Li, C.; Wang, B.; Ban, Y. Fault Location of Grid-connected Microgrid Lines Based on AVMD and Double-ended Traveling Wave Ranging. J. Phys. Conf. Ser. 2023, 2662, 012028. [Google Scholar] [CrossRef]
Pan, P.; Mandal, R.K.; Rahman Redoy Akanda, M.M. Fault Classification with Convolutional Neural Networks for Microgrid Systems. Int. Trans. Electr. Energy Syst. 2022, 2022, 8431450. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, S.; Li, L.; Wang, S.; Lu, T.; Yu, H.; Liu, W. A machine learning-based fault identification method for microgrids with distributed generations. J. Phys. Conf. Ser. 2022, 2360, 012019. [Google Scholar] [CrossRef]
Wu, H.; Ding, D.; She, Y.; Wang, L.; Ji, B.; Chen, T. Topology Optimization of Regional Power Grid Under Large-Scale Access of Distributed Photovoltaic Power Generation. J. Nanoelectron. Optoelectron. 2022, 17, 1648–1654. [Google Scholar] [CrossRef]
Xu, X.; Chen, S.; Zhu, Z.; Meng, A.; Zhang, J.; Wen, H.; Zhang, K. Solving dynamic economic dispatch problem of power grid based on hybrid topology crisscross optimization algorithm. In Proceedings Volume 12330, Proceedings of the International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2022), Huzhou, China, 15–18 April 2022; Guangdong Univ. of Technology (China): Guangzhou, China; Meizhou Power Supply Bureau of Guangdong Power Grid Co., Ltd. (China): Meizhou, China; Zhuhai Huacheng Power Engineering Consultants Co., Ltd. (China): Zhuhai, China, 2022. [Google Scholar]
Wu, Y.; Chen, J.; Ru, Y.; Xu, H.; Roger, M.; Ni, M. Research on Power Communication Network Planning Based on Information Transmission Reachability Against Cyber-Attacks. IEEE Syst. J. 2020, 15, 2883–2894. [Google Scholar] [CrossRef]
Qin, S.; Zhang, X.; Wang, J.; Guo, X.; Qi, L.; Cao, J.; Liu, Y. An Improved Q-Learning Algorithm for Optimizing Sustainable Remanufacturing Systems. Sustainability 2024, 16, 4180. [Google Scholar] [CrossRef]
Tresca, L.; Pulvirenti, L.; Rolando, L.; Millo, F. Development of a deep Q-learning energy management system for a hybrid electric vehicle. Transp. Eng. 2024, 16, 100241. [Google Scholar] [CrossRef]
Zhang, Q.; Wu, C.; Tian, H.; Gao, Y.; Yao, W.; Wu, L. Safety reinforcement learning control via transfer learning. Automatica 2024, 166, 111714. [Google Scholar] [CrossRef]
Emek, Y.; Lavi, R.; Niazadeh, R.; Shi, Y. Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes. Math. Oper. Res. 2024, 49, 880–900. [Google Scholar] [CrossRef]
Li, Q.; Lin, T.; Yu, Q.; Du, H.; Li, J.; Fu, X. Review of Deep Reinforcement Learning and Its Application in Modern Renewable Power System Control. Energies 2023, 16, 4143. [Google Scholar] [CrossRef]
Meng, T.; Li, X.; Zhang, S.; Zhao, Y. A Hybrid Secure Scheme for Wireless Sensor Networks against Timing Attacks Using Continuous-Time Markov Chain and Queueing Model. Sensors 2016, 16, 1606. [Google Scholar] [CrossRef] [PubMed]
Habib, K.; Habib, S.; Khan, S.; Jafaripournimchahi, A.; Xing, X. A hybrid optimization approach for strategically placing electric vehicle charging stations in a radial distribution IEEE-33 bus system. Eng. Res. Express 2024, 6, 025344. [Google Scholar] [CrossRef]
Liu, Y.; Ding, W.; Yang, M.; Zhu, H.; Liu, L.; Jin, T. Distributed Drive Autonomous Vehicle Trajectory Tracking Control Based on Multi-Agent Deep Reinforcement Learning. Mathematics 2024, 12, 1614. [Google Scholar] [CrossRef]
Lu, C.; Xuan, D.; Liu, S.; Tan, J.; Hu, H.; Kang, Z.; Lin, L. Active equalization control method for battery pack based on Double-DQN. J. Energy Storage 2024, 88, 111361. [Google Scholar] [CrossRef]
Yang, X.; Liu, P.; Liu, F.; Liu, Z.; Wang, D.; Zhu, J.; Wei, T. A DOD-SOH balancing control method for dynamic reconfigurable battery systems based on DQN algorithm. Front. Energy Res. 2023, 11, 1333147. [Google Scholar] [CrossRef]
Wang, Z.; Li, X.; Sun, L.; Zhang, H.; Liu, H.; Wang, J. Learning State-Specific Action Masks for Reinforcement Learning. Algorithms 2024, 17, 60. [Google Scholar] [CrossRef]

Figure 1. Simple topology structure of new power system microgrid.

Figure 2. Framework of network topology optimization method based on deep reinforcement learning.

Figure 3. The grid topology structure built in this paper.

Figure 4. Detailed flow chart of the proposed method.

Figure 5. Single node failure.

Figure 6. Topology adjustment result of single node failure.

Figure 7. Changes of fault nodes inside the microgrid under single node failure.

Figure 8. Multi-node failure.

Figure 9. Topology adjustment result of multi-node failure.

Figure 10. Changes in fault nodes inside the microgrid under multiple node failure.

Table 1. The experimental settings of single node failure in microgrid.

Faulty Node ID	Faulty Node Type	Fault Duration/h	Fault Period
10	PV	2	1–3

Table 2. The changes in network topology under single node failure.

Nodes Involved	Before Adjustment	After Adjustment
11-18	10-11-12-13-14-15-16-17-18	22-12-11 22-12-13-14-15-16-17-18

Table 3. The experimental settings of multi-node failure in microgrid.

Faulty Node ID	Faulty Node Type	Fault Duration/h	Fault Period
5	PV	3	1–4
14	Load	3	1–4
28	Load	3	1–4

Table 4. The changes in network topology under multiple node failure.

Nodes Involved	Before Adjustment	After Adjustment
6-18 26-33	6-7-8-9-10-11-12-13-14-15-16-17-18 6-26-27-28-29-30-31-32-33	21-8-7-6-26-27 21-8-9-10-11-12-13 21-8-9-15-16-17-18-33-32-31-30-29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, X.; Liao, X.; Cui, W.; Meng, X.; Liu, R.; Ye, Q.; Li, D. A Deep Reinforcement Learning Optimization Method Considering Network Node Failures. Energies 2024, 17, 4471. https://doi.org/10.3390/en17174471

AMA Style

Ding X, Liao X, Cui W, Meng X, Liu R, Ye Q, Li D. A Deep Reinforcement Learning Optimization Method Considering Network Node Failures. Energies. 2024; 17(17):4471. https://doi.org/10.3390/en17174471

Chicago/Turabian Style

Ding, Xueying, Xiao Liao, Wei Cui, Xiangliang Meng, Ruosong Liu, Qingshan Ye, and Donghe Li. 2024. "A Deep Reinforcement Learning Optimization Method Considering Network Node Failures" Energies 17, no. 17: 4471. https://doi.org/10.3390/en17174471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Reinforcement Learning Optimization Method Considering Network Node Failures

Abstract

1. Introduction

2. System Architecture

3. Model

3.1. IEEE-33 Node System Construction

3.2. DQN Model

3.2.1. Fundamental

3.2.2. Process

4. Experimental Settings

4.1. Single Node Failure in Microgrid

4.2. Multi-Node Failure in Microgrid

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI