Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Action Set Based Policy Optimization for Safe Power Grid Management

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track (ECML PKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

  • 1467 Accesses

Abstract

Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.

B. Zhou, H. Zeng—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Our code is available open-source at: https://github.com/PaddlePaddle/PARL/tree/develop/examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge.

References

  1. Bernard, S., Trudel, G., Scott, G.: A 735 kV shunt reactors automatic switching system for hydro-Quebec network. IEEE Trans. Power Syst. 11(CONF-960111-), 2024–2030 (1996)

    Google Scholar 

  2. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)

    Google Scholar 

  3. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  5. Dalal, G., Gilboa, E., Mannor, S.: Hierarchical decision making in electricity grid management. In: International Conference on Machine Learning, pp. 2197–2206 (2016)

    Google Scholar 

  6. Diao, R., Wang, Z., Shi, D., Chang, Q., Duan, J., Zhang, X.: Autonomous voltage control for grid operation using deep reinforcement learning. In: 2019 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5. IEEE (2019)

    Google Scholar 

  7. Eigen, M.: Ingo rechenberg evolutionsstrategie optimierung technischer systeme nach prinzipien der biologishen evolution. In: mit einem Nachwort von Manfred Eigen, vol. 45, pp. 46–47 (1973)

    Google Scholar 

  8. Ernst, D., Glavic, M., Wehenkel, L.: Power systems stability control: reinforcement learning framework. IEEE Trans. Power Syst. 19(1), 427–435 (2004)

    Article  Google Scholar 

  9. Fisher, E.B., O’Neill, R.P., Ferris, M.C.: Optimal transmission switching. IEEE Trans. Power Syst. 23(3), 1346–1355 (2008)

    Article  Google Scholar 

  10. Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practice-a survey. Automatica 25(3), 335–348 (1989)

    Article  Google Scholar 

  11. Horgan, D., et al.: Distributed prioritized experience replay. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=H1Dy--0Z

  12. Huang, Q., Huang, R., Hao, W., Tan, J., Fan, R., Huang, Z.: Adaptive power system emergency control using deep reinforcement learning. IEEE Trans. Smart Grid 11(2), 1171–1182 (2019)

    Article  Google Scholar 

  13. Jin, L., Kumar, R., Elia, N.: Model predictive control-based real-time power system protection schemes. IEEE Trans. Power Syst. 25(2), 988–998 (2009)

    Article  Google Scholar 

  14. Khodaei, A., Shahidehpour, M.: Transmission switching in security-constrained unit commitment. IEEE Trans. Power Syst. 25(4), 1937–1945 (2010)

    Article  Google Scholar 

  15. Larsson, M., Hill, D.J., Olsson, G.: Emergency voltage control using search and predictive control. Int. J. Electr. Power Energy Syst. 24(2), 121–130 (2002)

    Article  Google Scholar 

  16. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5(47), eabc5986 (2020)

    Google Scholar 

  17. Marot, A., et al.: Learning to run a power network challenge for training topology controllers. Electr. Power Syst. Res. 189, 106635 (2020)

    Article  Google Scholar 

  18. Marot, A., et al.: L2RPN: learning to run a power network in a sustainable world NeurIPS2020 challenge design (2020)

    Google Scholar 

  19. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  20. Otomega, B., Glavic, M., Van Cutsem, T.: Distributed undervoltage load shedding. IEEE Trans. Power Syst. 22(4), 2283–2284 (2007)

    Article  Google Scholar 

  21. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

  22. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)

    Article  Google Scholar 

  23. Schwefel, H.P.: Numerische optimierung von computer-modellen mittels der evolutionsstrategie. (Teil 1, Kap. 1–5). Birkhäuser (1977)

    Google Scholar 

  24. Shah, S., Arunesh, S., Pradeep, V., Andrew, P., Milind, T.: Solving online threat screening games using constrained action space reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2226–2235 (2020)

    Google Scholar 

  25. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  26. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  27. Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proc. IEEE 93(5), 965–979 (2005)

    Article  Google Scholar 

  28. Trudel, G., Bernard, S., Scott, G.: Hydro-Quebec’s defence plan against extreme contingencies. IEEE Trans. Power Syst. 14(3), 958–965 (1999)

    Article  Google Scholar 

  29. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  30. Yang, Q., Wang, G., Sadeghi, A., Giannakis, G.B., Sun, J.: Two-timescale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2313–2323 (2019)

    Article  Google Scholar 

  31. Yoon, D., Hong, S., Lee, B.J., Kim, K.E.: Winning the L2RPN challenge: power grid management via semi-Markov afterstate actor-critic. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=LmUJqB1Cz8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Wang .

Editor information

Editors and Affiliations

A Grid2Op Environment

A Grid2Op Environment

Grid2Op [18] is an open-source environment developed for testing the performance of controllers in power grid management. It simulates the physical power grid and follows the real-world power system operational constraints and distributions.

The environment provides interactive interfaces based on the gym library [3]. At each episode, it simulates a period of time (e.g., a week or month) at the time interval of 5 min. At each time step, the controller receives the state of the power grid and takes actions to operate the grid if necessary. The simulation terminates at the end of the period or terminates prematurely if the controller fails to operate the grid properly, which can occur under two conditions: (1) Some actions split the grid into several isolated sub-grids. (2) The electricity power transmitted from the stations cannot meet the consumption requirement of some loads. Too many disconnected lines will significantly increase the risk of causing these two conditions. A power line gets disconnected automatically if the current flow exceeds the maximum limit for 3 time steps (i.e., 15 min). In this case, the power line cannot be reconnected until the end of the recovery time of 12 time steps.

Grid2Op has a large state space and action space. In addition to the exponentially increasing possible grid topology we mentioned, the grid state contains other topology features such as the current, voltage magnitude of each power line, generation power of each power station, the required power of each load. Though only one substation can be reconfigured at each time step (to simulate that a human or an expert can perform a limited number of actions in a time period), the number of available actions for topology reconfiguration is \({\sum _{i=1}^{i=M}2^{|S_i|}}\). In the NeurIPS2020 L2RPN competition, there are 118 substations with 186 power lines, which introduces over 70,000 discrete actions related to unique topology.

The reward setting in Grid2Op mainly relies on the reliability of the power grid. At each time step, the environment gives a bonus for safe management, and the controller will no longer gain positive rewards if it fails to manage the power network properly, which can lead to early termination of the episode. There are also costs (penalty) of operations. To encourage the controller to explore the operation flexibility on topology reconfiguration, the cost of topology change is much smaller than re-dispatching the power generation of the power plant.

In the physical world, the operators often use a simulation system to compute the possible outcomes of actions to control risk [1, 18, 28], and Grid2Op also provides a similar function named simulate, which can mimic the one-step operational process. It allows the user to check if the action violates the power network constraint (e.g., if the target power generation exceeds the maximum output of the power plant). Note that this function can only be called once at each time step (i.e., one-step simulation), and its prediction on future states may bias from the actual state.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, B., Zeng, H., Liu, Y., Li, K., Wang, F., Tian, H. (2021). Action Set Based Policy Optimization for Safe Power Grid Management. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86517-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86516-0

  • Online ISBN: 978-3-030-86517-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics