Action Set Based Policy Optimization for Safe Power Grid Management

Zhou, Bo; Zeng, Hongsheng; Liu, Yuecheng; Li, Kejiao; Wang, Fan; Tian, Hao

doi:10.1007/978-3-030-86517-7_11

Bo Zhou¹²,
Hongsheng Zeng¹²,
Yuecheng Liu¹²,
Kejiao Li¹²,
Fan Wang¹² &
…
Hao Tian¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1467 Accesses

Abstract

Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.

B. Zhou, H. Zeng—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Resilience of Microgrid-Integrated Power Systems in Disaster Events Using Reinforcement Learning

An Improved Reinforcement Learning for Security-Constrained Economic Dispatch of Battery Energy Storage in Microgrids

An Adaptive Interpretable Safe-RL Approach for Addressing Smart Grid Supply-Side Uncertainties

Notes

1.
Our code is available open-source at: https://github.com/PaddlePaddle/PARL/tree/develop/examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge.

References

Bernard, S., Trudel, G., Scott, G.: A 735 kV shunt reactors automatic switching system for hydro-Quebec network. IEEE Trans. Power Syst. 11(CONF-960111-), 2024–2030 (1996)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Dalal, G., Gilboa, E., Mannor, S.: Hierarchical decision making in electricity grid management. In: International Conference on Machine Learning, pp. 2197–2206 (2016)
Google Scholar
Diao, R., Wang, Z., Shi, D., Chang, Q., Duan, J., Zhang, X.: Autonomous voltage control for grid operation using deep reinforcement learning. In: 2019 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5. IEEE (2019)
Google Scholar
Eigen, M.: Ingo rechenberg evolutionsstrategie optimierung technischer systeme nach prinzipien der biologishen evolution. In: mit einem Nachwort von Manfred Eigen, vol. 45, pp. 46–47 (1973)
Google Scholar
Ernst, D., Glavic, M., Wehenkel, L.: Power systems stability control: reinforcement learning framework. IEEE Trans. Power Syst. 19(1), 427–435 (2004)
Article Google Scholar
Fisher, E.B., O’Neill, R.P., Ferris, M.C.: Optimal transmission switching. IEEE Trans. Power Syst. 23(3), 1346–1355 (2008)
Article Google Scholar
Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practice-a survey. Automatica 25(3), 335–348 (1989)
Article Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=H1Dy--0Z
Huang, Q., Huang, R., Hao, W., Tan, J., Fan, R., Huang, Z.: Adaptive power system emergency control using deep reinforcement learning. IEEE Trans. Smart Grid 11(2), 1171–1182 (2019)
Article Google Scholar
Jin, L., Kumar, R., Elia, N.: Model predictive control-based real-time power system protection schemes. IEEE Trans. Power Syst. 25(2), 988–998 (2009)
Article Google Scholar
Khodaei, A., Shahidehpour, M.: Transmission switching in security-constrained unit commitment. IEEE Trans. Power Syst. 25(4), 1937–1945 (2010)
Article Google Scholar
Larsson, M., Hill, D.J., Olsson, G.: Emergency voltage control using search and predictive control. Int. J. Electr. Power Energy Syst. 24(2), 121–130 (2002)
Article Google Scholar
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5(47), eabc5986 (2020)
Google Scholar
Marot, A., et al.: Learning to run a power network challenge for training topology controllers. Electr. Power Syst. Res. 189, 106635 (2020)
Article Google Scholar
Marot, A., et al.: L2RPN: learning to run a power network in a sustainable world NeurIPS2020 challenge design (2020)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Otomega, B., Glavic, M., Van Cutsem, T.: Distributed undervoltage load shedding. IEEE Trans. Power Syst. 22(4), 2283–2284 (2007)
Article Google Scholar
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)
Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Article Google Scholar
Schwefel, H.P.: Numerische optimierung von computer-modellen mittels der evolutionsstrategie. (Teil 1, Kap. 1–5). Birkhäuser (1977)
Google Scholar
Shah, S., Arunesh, S., Pradeep, V., Andrew, P., Milind, T.: Solving online threat screening games using constrained action space reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2226–2235 (2020)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proc. IEEE 93(5), 965–979 (2005)
Article Google Scholar
Trudel, G., Bernard, S., Scott, G.: Hydro-Quebec’s defence plan against extreme contingencies. IEEE Trans. Power Syst. 14(3), 958–965 (1999)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Yang, Q., Wang, G., Sadeghi, A., Giannakis, G.B., Sun, J.: Two-timescale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2313–2323 (2019)
Article Google Scholar
Yoon, D., Hong, S., Lee, B.J., Kim, K.E.: Winning the L2RPN challenge: power grid management via semi-Markov afterstate actor-critic. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=LmUJqB1Cz8

Download references

Author information

Authors and Affiliations

Baidu Inc., Beijing, China
Bo Zhou, Hongsheng Zeng, Yuecheng Liu, Kejiao Li, Fan Wang & Hao Tian

Authors

Bo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hongsheng Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yuecheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kejiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Wang .

Editor information

Editors and Affiliations

Facebook AI, Seattle, WA, USA
Yuxiao Dong
Torre Telefonica, Barcelona, Spain
Nicolas Kourtellis
Bielefeld University, CITEC, Bielefeld, Germany
Barbara Hammer
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

A Grid2Op Environment

Grid2Op [18] is an open-source environment developed for testing the performance of controllers in power grid management. It simulates the physical power grid and follows the real-world power system operational constraints and distributions.

The environment provides interactive interfaces based on the gym library [3]. At each episode, it simulates a period of time (e.g., a week or month) at the time interval of 5 min. At each time step, the controller receives the state of the power grid and takes actions to operate the grid if necessary. The simulation terminates at the end of the period or terminates prematurely if the controller fails to operate the grid properly, which can occur under two conditions: (1) Some actions split the grid into several isolated sub-grids. (2) The electricity power transmitted from the stations cannot meet the consumption requirement of some loads. Too many disconnected lines will significantly increase the risk of causing these two conditions. A power line gets disconnected automatically if the current flow exceeds the maximum limit for 3 time steps (i.e., 15 min). In this case, the power line cannot be reconnected until the end of the recovery time of 12 time steps.

Grid2Op has a large state space and action space. In addition to the exponentially increasing possible grid topology we mentioned, the grid state contains other topology features such as the current, voltage magnitude of each power line, generation power of each power station, the required power of each load. Though only one substation can be reconfigured at each time step (to simulate that a human or an expert can perform a limited number of actions in a time period), the number of available actions for topology reconfiguration is ${\sum _{i=1}^{i=M}2^{|S_i|}}$. In the NeurIPS2020 L2RPN competition, there are 118 substations with 186 power lines, which introduces over 70,000 discrete actions related to unique topology.

The reward setting in Grid2Op mainly relies on the reliability of the power grid. At each time step, the environment gives a bonus for safe management, and the controller will no longer gain positive rewards if it fails to manage the power network properly, which can lead to early termination of the episode. There are also costs (penalty) of operations. To encourage the controller to explore the operation flexibility on topology reconfiguration, the cost of topology change is much smaller than re-dispatching the power generation of the power plant.

In the physical world, the operators often use a simulation system to compute the possible outcomes of actions to control risk [1, 18, 28], and Grid2Op also provides a similar function named simulate, which can mimic the one-step operational process. It allows the user to check if the action violates the power network constraint (e.g., if the target power generation exceeds the maximum output of the power plant). Note that this function can only be called once at each time step (i.e., one-step simulation), and its prediction on future states may bias from the actual state.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, B., Zeng, H., Liu, Y., Li, K., Wang, F., Tian, H. (2021). Action Set Based Policy Optimization for Safe Power Grid Management. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-86517-7_11
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Action Set Based Policy Optimization for Safe Power Grid Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Resilience of Microgrid-Integrated Power Systems in Disaster Events Using Reinforcement Learning

An Improved Reinforcement Learning for Security-Constrained Economic Dispatch of Battery Energy Storage in Microgrids

An Adaptive Interpretable Safe-RL Approach for Addressing Smart Grid Supply-Side Uncertainties

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Grid2Op Environment

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Action Set Based Policy Optimization for Safe Power Grid Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Resilience of Microgrid-Integrated Power Systems in Disaster Events Using Reinforcement Learning

An Improved Reinforcement Learning for Security-Constrained Economic Dispatch of Battery Energy Storage in Microgrids

An Adaptive Interpretable Safe-RL Approach for Addressing Smart Grid Supply-Side Uncertainties

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Grid2Op Environment

A Grid2Op Environment

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation