Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning

  • Conference paper
  • First Online:
Parallel Problem Solving from Nature – PPSN XVII (PPSN 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13398))

Included in the following conference series:

  • 1838 Accesses

Abstract

Generative adversarial imitation learning (GAIL) learns an optimal policy by expert demonstrations from the environment with unknown reward functions. Different from existing works that studied the generalization of reward function classes or discriminator classes, we focus on policy classes. This paper investigates the generalization and computation for policy classes of GAIL. Specifically, our contributions lie in: 1) We prove that the generalization is guaranteed in GAIL when the complexity of policy classes is properly controlled. 2) We provide an off-policy framework called the two-stage stochastic gradient (TSSG), which can efficiently solve GAIL based on the soft policy iteration and attain the sublinear convergence rate to a stationary solution. The comprehensive numerical simulations are illustrated in MuJoCo environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The Supplementary Material is released at https://github.com/MDM-shu/GAIL-Policy-Generalization-and-TSSG.

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: International Conference on Machine Learning, pp. 1–8 (2004)

    Google Scholar 

  2. Arora, S., Du, S.S., Hu, W., Li, Z., Salakhutdinov, R.R., Wang, R.: On exact computation with an infinitely wide neural net. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8139–8148 (2019)

    Google Scholar 

  3. Arora, S., Ge, R., Liang, Y., Ma, T., Zhang, Y.: Generalization and equilibrium in generative adversarial nets (GANs). In: International Conference on Machine Learning, pp. 224–232 (2017)

    Google Scholar 

  4. Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(1), 714–751 (2017)

    MathSciNet  MATH  Google Scholar 

  5. Bain, M., Sammut, C.: A framework for behavioural cloning. Mach. Intell. 15, 103–129 (1995)

    Google Scholar 

  6. Bhattacharyya, R.P., Phillips, D.J., Wulfe, B., Morton, J., Kuefler, A., Kochenderfer, M.J.: Multi-agent imitation learning for driving simulation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1534–1539. IEEE (2018)

    Google Scholar 

  7. Bietti, A., Mairal, J.: On the inductive bias of neural tangent kernels. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12873–12884 (2019)

    Google Scholar 

  8. Chen, M., et al.: On computation and generalization of generative adversarial imitation learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  9. Chi, W., et al.: Collaborative robot-assisted endovascular catheterization with generative adversarial imitation learning. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2414–2420 (2020)

    Google Scholar 

  10. Dally, K., Van Kampen, E.J.: Soft actor-critic deep reinforcement learning for fault tolerant flight control. In: AIAA SCITECH 2022 Forum, pp. 2078–2097 (2022)

    Google Scholar 

  11. Fu, J., Luo, K., Levine, S.: Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248 (2017)

  12. Guan, Z., Xu, T., Liang, Y.: When will generative adversarial imitation learning algorithms attain global convergence? In: International Conference on Artificial Intelligence and Statistics, pp. 1117–1125 (2021)

    Google Scholar 

  13. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352–1361 (2017)

    Google Scholar 

  14. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870 (2018)

    Google Scholar 

  15. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  16. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4565–4573 (2016)

    Google Scholar 

  17. Jabri, M.K.: Robot manipulation learning using generative adversarial imitation learning. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 4893–4894 (2021)

    Google Scholar 

  18. Kim, K.E., Park, H.S.: Imitation learning via kernel mean embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3415–3422 (2018)

    Google Scholar 

  19. Li, S., Xiao, S., Zhu, S., Du, N., Xie, Y., Song, L.: Learning temporal point processes via reinforcement learning. arXiv preprint arXiv:1811.05016 (2018)

  20. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)

    Google Scholar 

  21. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge (2018)

    Google Scholar 

  22. Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)

    Article  MathSciNet  Google Scholar 

  23. Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: International Conference on Machine Learning, pp. 663–670 (2000)

    Google Scholar 

  24. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Hoboken (2014)

    Google Scholar 

  25. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1177–1184 (2007)

    Google Scholar 

  26. Ross, S., Bagnell, D.: Efficient reductions for imitation learning. In: International Conference on Artificial Intelligence and Statistics, pp. 661–668 (2010)

    Google Scholar 

  27. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)

    Google Scholar 

  28. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  29. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Google Scholar 

  30. Shani, L., Zahavy, T., Mannor, S.: Online apprenticeship learning. arXiv preprint arXiv:2102.06924 (2021)

  31. Shi, J.C., Yu, Y., Da, Q., Chen, S.Y., Zeng, A.X.: Virtual-Taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4902–4909 (2019)

    Google Scholar 

  32. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  33. Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1449–1456 (2007)

    Google Scholar 

  34. Syed, U., Schapire, R.E.: A reduction from apprenticeship learning to classification. In: Advances in Neural Information Processing Systems, vol. 23, pp. 2253–2261. Citeseer (2010)

    Google Scholar 

  35. Xu, T., Li, Z., Yu, Y.: On value discrepancy of imitation learning. arXiv preprint arXiv:1911.07027 (2019)

  36. Xu, T., Li, Z., Yu, Y.: Error bounds of imitating policies and environments. In: Advances in Neural Information Processing Systems, vol. 33, pp. 15737–15749 (2020)

    Google Scholar 

  37. Zhang, Y.F., Luo, F.M., Yu, Y.: Improve generated adversarial imitation learning with reward variance regularization. Mach. Learn. 111(3), 977–995 (2022)

    Google Scholar 

  38. Zhang, Y., Cai, Q., Yang, Z., Wang, Z.: Generative adversarial imitation learning with neural network parameterization: global optimality and convergence rate. In: International Conference on Machine Learning, pp. 11044–11054 (2020)

    Google Scholar 

  39. Ziebart, B.D., et al.: Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 8, pp. 1433–1438 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaxin Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y. et al. (2022). Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning. In: Rudolph, G., Kononova, A.V., Aguirre, H., Kerschke, P., Ochoa, G., Tušar, T. (eds) Parallel Problem Solving from Nature – PPSN XVII. PPSN 2022. Lecture Notes in Computer Science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14714-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14713-5

  • Online ISBN: 978-3-031-14714-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics