Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning

Zhou, Yirui; Zhang, Yangchun; Liu, Xiaowei; Wang, Wanying; Che, Zhengping; Xu, Zhiyuan; Tang, Jian; Peng, Yaxin

doi:10.1007/978-3-031-14714-2_27

Yirui Zhou¹³,
Yangchun Zhang¹³,
Xiaowei Liu¹³,
Wanying Wang¹³,
Zhengping Che¹⁴,
Zhiyuan Xu¹⁴,
Jian Tang¹⁴ &
…
Yaxin Peng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13398))

Included in the following conference series:

International Conference on Parallel Problem Solving from Nature

1838 Accesses

Abstract

Generative adversarial imitation learning (GAIL) learns an optimal policy by expert demonstrations from the environment with unknown reward functions. Different from existing works that studied the generalization of reward function classes or discriminator classes, we focus on policy classes. This paper investigates the generalization and computation for policy classes of GAIL. Specifically, our contributions lie in: 1) We prove that the generalization is guaranteed in GAIL when the complexity of policy classes is properly controlled. 2) We provide an off-policy framework called the two-stage stochastic gradient (TSSG), which can efficiently solve GAIL based on the soft policy iteration and attain the sublinear convergence rate to a stationary solution. The comprehensive numerical simulations are illustrated in MuJoCo environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improve generated adversarial imitation learning with reward variance regularization

Article 03 January 2022

Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Article 28 April 2023

Lipschitzness is all you need to tame off-policy generative adversarial imitation learning

Article Open access 04 April 2022

Notes

1.
The Supplementary Material is released at https://github.com/MDM-shu/GAIL-Policy-Generalization-and-TSSG.

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: International Conference on Machine Learning, pp. 1–8 (2004)
Google Scholar
Arora, S., Du, S.S., Hu, W., Li, Z., Salakhutdinov, R.R., Wang, R.: On exact computation with an infinitely wide neural net. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8139–8148 (2019)
Google Scholar
Arora, S., Ge, R., Liang, Y., Ma, T., Zhang, Y.: Generalization and equilibrium in generative adversarial nets (GANs). In: International Conference on Machine Learning, pp. 224–232 (2017)
Google Scholar
Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(1), 714–751 (2017)
MathSciNet MATH Google Scholar
Bain, M., Sammut, C.: A framework for behavioural cloning. Mach. Intell. 15, 103–129 (1995)
Google Scholar
Bhattacharyya, R.P., Phillips, D.J., Wulfe, B., Morton, J., Kuefler, A., Kochenderfer, M.J.: Multi-agent imitation learning for driving simulation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1534–1539. IEEE (2018)
Google Scholar
Bietti, A., Mairal, J.: On the inductive bias of neural tangent kernels. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12873–12884 (2019)
Google Scholar
Chen, M., et al.: On computation and generalization of generative adversarial imitation learning. In: International Conference on Learning Representations (2020)
Google Scholar
Chi, W., et al.: Collaborative robot-assisted endovascular catheterization with generative adversarial imitation learning. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2414–2420 (2020)
Google Scholar
Dally, K., Van Kampen, E.J.: Soft actor-critic deep reinforcement learning for fault tolerant flight control. In: AIAA SCITECH 2022 Forum, pp. 2078–2097 (2022)
Google Scholar
Fu, J., Luo, K., Levine, S.: Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248 (2017)
Guan, Z., Xu, T., Liang, Y.: When will generative adversarial imitation learning algorithms attain global convergence? In: International Conference on Artificial Intelligence and Statistics, pp. 1117–1125 (2021)
Google Scholar
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352–1361 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870 (2018)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4565–4573 (2016)
Google Scholar
Jabri, M.K.: Robot manipulation learning using generative adversarial imitation learning. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 4893–4894 (2021)
Google Scholar
Kim, K.E., Park, H.S.: Imitation learning via kernel mean embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3415–3422 (2018)
Google Scholar
Li, S., Xiao, S., Zhu, S., Du, N., Xie, Y., Song, L.: Learning temporal point processes via reinforcement learning. arXiv preprint arXiv:1811.05016 (2018)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Google Scholar
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge (2018)
Google Scholar
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article MathSciNet Google Scholar
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: International Conference on Machine Learning, pp. 663–670 (2000)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Hoboken (2014)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1177–1184 (2007)
Google Scholar
Ross, S., Bagnell, D.: Efficient reductions for imitation learning. In: International Conference on Artificial Intelligence and Statistics, pp. 661–668 (2010)
Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Google Scholar
Shani, L., Zahavy, T., Mannor, S.: Online apprenticeship learning. arXiv preprint arXiv:2102.06924 (2021)
Shi, J.C., Yu, Y., Da, Q., Chen, S.Y., Zeng, A.X.: Virtual-Taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4902–4909 (2019)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1449–1456 (2007)
Google Scholar
Syed, U., Schapire, R.E.: A reduction from apprenticeship learning to classification. In: Advances in Neural Information Processing Systems, vol. 23, pp. 2253–2261. Citeseer (2010)
Google Scholar
Xu, T., Li, Z., Yu, Y.: On value discrepancy of imitation learning. arXiv preprint arXiv:1911.07027 (2019)
Xu, T., Li, Z., Yu, Y.: Error bounds of imitating policies and environments. In: Advances in Neural Information Processing Systems, vol. 33, pp. 15737–15749 (2020)
Google Scholar
Zhang, Y.F., Luo, F.M., Yu, Y.: Improve generated adversarial imitation learning with reward variance regularization. Mach. Learn. 111(3), 977–995 (2022)
Google Scholar
Zhang, Y., Cai, Q., Yang, Z., Wang, Z.: Generative adversarial imitation learning with neural network parameterization: global optimality and convergence rate. In: International Conference on Machine Learning, pp. 11044–11054 (2020)
Google Scholar
Ziebart, B.D., et al.: Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 8, pp. 1433–1438 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, School of Science, Shanghai University, Shanghai, 200444, China
Yirui Zhou, Yangchun Zhang, Xiaowei Liu, Wanying Wang & Yaxin Peng
AI Innovation Center, Midea Group, Shanghai, 201702, China
Zhengping Che, Zhiyuan Xu & Jian Tang

Authors

Yirui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yangchun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wanying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengping Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaxin Peng .

Editor information

Editors and Affiliations

TU Dortmund, Dortmund, Germany
Günter Rudolph
Leiden University, Leiden, The Netherlands
Anna V. Kononova
Shinshu University, Nagano, Japan
Hernán Aguirre
Technische Universität Dresden, Dresden, Germany
Pascal Kerschke
University of Stirling, Stirling, UK
Gabriela Ochoa
Jožef Stefan Institute, Ljubljana, Slovenia
Tea Tušar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y. et al. (2022). Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning. In: Rudolph, G., Kononova, A.V., Aguirre, H., Kerschke, P., Ochoa, G., Tušar, T. (eds) Parallel Problem Solving from Nature – PPSN XVII. PPSN 2022. Lecture Notes in Computer Science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-14714-2_27
Published: 14 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14713-5
Online ISBN: 978-3-031-14714-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improve generated adversarial imitation learning with reward variance regularization

Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Lipschitzness is all you need to tame off-policy generative adversarial imitation learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improve generated adversarial imitation learning with reward variance regularization

Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Lipschitzness is all you need to tame off-policy generative adversarial imitation learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation