Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Sim-to-Lab-to-Real: : Safe reinforcement learning with shielding and generalization guarantees

Published: 01 January 2023 Publication History

Abstract

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the Safety Bellman Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. Additionally, inheriting from the HJ reachability analysis, the bound accounts for the expectation over the worst-case safety in each environment. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments with varying degrees of photorealism. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

References

[1]
A. Kumar, Z. Fu, D. Pathak, J. Malik, RMA: rapid motor adaptation for legged robots, in: Proceedings of Robotics: Science and Systems (RSS), Virtual, 2021,.
[2]
Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 3357–3364,.
[3]
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30,.
[4]
Muratore, F.; Ramos, F.; Turk, G.; Yu, W.; Gienger, M.; Peters, J. (2021): Robot learning from randomized simulations: a review. arXiv:2111.00956.
[5]
F. Sadeghi, S. Levine, Cad2rl: real single-image flight without a single real image, in: Proceedings of Robotics: Science and Systems (RSS), Cambridge, Massachusetts, 2017,.
[6]
H. Fu, B. Cai, L. Gao, L.-X. Zhang, J. Wang, C. Li, Q. Zeng, C. Sun, R. Jia, B. Zhao, H. Zhang, 3D-FRONT: 3D furnished rooms with layOuts and semaNTics, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10933–10942.
[7]
Boston-Dynamics (2022): Inside the lab: robotics after hours. https://www.youtube.com/watch?v=Jq0GknnKvXM.
[8]
Y. Chow, M. Ghavamzadeh, Algorithms for cvar optimization in mdps, in: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), Montreal, Quebec, Canada, 2014, pp. 3509–3517.
[9]
Y. Chow, M. Ghavamzadeh, L. Janson, M. Pavone, Risk-constrained reinforcement learning with percentile risk criteria, J. Mach. Learn. Res. 18 (2017) 6070–6120.
[10]
J.F. Fisac, A.K. Akametalu, M.N. Zeilinger, S. Kaynama, J. Gillula, C.J. Tomlin, A general safety framework for learning-based control in uncertain robotic systems, IEEE Trans. Autom. Control 64 (2019) 2737–2752.
[11]
J.F. Fisac, N.F. Lugovoy, V. Rubies-Royo, S. Ghosh, C.J. Tomlin, Bridging Hamilton-Jacobi safety analysis and reinforcement learning, in: Proceedings of the International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556,.
[12]
K.-C. Hsu, V. Rubies-Royo, C.J. Tomlin, J.F. Fisac, Safety and liveness guarantees through reach-avoid reinforcement learning, in: Proceedings of Robotics: Science and Systems, Virtual, 2021,.
[13]
Srinivasan, K.; Eysenbach, B.; Ha, S.; Tan, J.; Finn, C. (2020): Learning to be safe: deep RL with a safety critic. arXiv:2010.14603.
[14]
B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J.E. Gonzalez, J. Ibarz, C. Finn, K. Goldberg, Recovery RL: safe reinforcement learning with learned recovery zones, IEEE Robot. Autom. Lett. 6 (2021) 4915–4922.
[15]
K. Zhou, J.C. Doyle, Essentials of Robust Control, vol. 104, Prentice Hall, Upper Saddle River, NJ, 1998.
[16]
S. Xu, T. Chen, Robust h-infinity control for uncertain stochastic systems with state delay, IEEE Transactions on Automatic Control 47 (2002) 2089–2094.
[17]
A. Majumdar, R. Tedrake, Funnel libraries for real-time robust feedback motion planning, Int. J. Robot. Res. 36 (2017) 947–982.
[18]
S. Singh, A. Majumdar, J.-J. Slotine, M. Pavone, Robust online motion planning via contraction theory and convex optimization, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 5883–5890,.
[19]
A. Majumdar, A. Farid, A. Sonar, PAC-Bayes control: learning policies that provably generalize to novel environments, Int. J. Robot. Res. 40 (2021) 574–593.
[20]
A. Farid, S. Veer, A. Majumdar, Task-driven out-of-distribution detection with statistical guarantees for robot learning, in: Proceedings of the Conference on Robot Learning (CoRL), 2021.
[21]
S. Veer, A. Majumdar, Probably approximately correct vision-based planning using motion primitives, in: Proceedings of the 2020 Conference on Robot Learning (CoRL), in: Proceedings of Machine Learning Research, PMLR, vol. 155, 2021, pp. 1001–1014.
[22]
J. García, F. Fernández, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res. 16 (2015) 1437–1480.
[23]
S. Bansal, M. Chen, S. Herbert, C.J. Tomlin, Hamilton-Jacobi reachability: a brief overview and recent advances, in: Proceedings of the IEEE 56th Annual Conference on Decision and Control (CDC), 2017, pp. 2242–2253,.
[24]
J.F. Fisac, M. Chen, C.J. Tomlin, S.S. Sastry, Reach-avoid problems with time-varying dynamics, targets and constraints, in: Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control, HSCC '15, New York, NY, USA, 2015, pp. 11–20,.
[25]
K. Leung, E. Schmerling, M. Zhang, M. Chen, J. Talbot, J.C. Gerdes, M. Pavone, On infusing reachability-based safety assurance within planning frameworks for human–robot vehicle interactions, Int. J. Robot. Res. 39 (2020) 1326–1345.
[26]
R. Cheng, G. Orosz, R.M. Murray, J.W. Burdick, End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, in: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI'19/IAAI'19/EAAI'19, AAAI Press, 2019,.
[27]
Dalal, G.; Dvijotham, K.; Vecerik, M.; Hester, T.; Paduraru, C.; Tassa, Y. (2018): Safe exploration in continuous action spaces. arXiv:1801.08757.
[28]
Chen, B.; Francis, J.; Oh, J.; Nyberg, E.; Herbert, S.L. (2021): Safe autonomous racing via approximate reachability on ego-vision. arXiv:2110.07699.
[29]
F. Berkenkamp, A.P. Schoellig, A. Krause, Safe controller optimization for quadrotors with gaussian processes, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 491–496,.
[30]
T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, Learning-based model predictive control for safe exploration, in: Proceedings of the IEEE Conference on Decision and Control (CDC), 2018, pp. 6059–6066,.
[31]
A. Liu, G. Shi, S.-J. Chung, A. Anandkumar, Y. Yue, Robust regression for safe exploration in control, in: Proceedings of the 2nd Conference on Learning for Dynamics and Control, in: Proceedings of Machine Learning Research, PMLR, vol. 120, 2020, pp. 608–619. https://proceedings.mlr.press/v120/liu20a.html.
[32]
V.N. Vapnik, A.Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, in: Measures of Complexity, Springer, 2015, pp. 11–30.
[33]
O. Bousquet, S. Boucheron, G. Lugosi, Introduction to statistical learning theory, in: Summer School on Machine Learning, Springer, 2003, pp. 169–207.
[34]
D.A. McAllester, Some PAC-bayesian theorems, Mach. Learn. 37 (1999) 355–363.
[35]
G.K. Dziugaite, D.M. Roy, Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, in: Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI), Sydney, Australia, August 11–15, 2017.
[36]
M. Pérez-Ortiz, O. Rivasplata, J. Shawe-Taylor, C. Szepesvári, Tighter risk certificates for neural networks, J. Mach. Learn. Res. 22 (2021).
[37]
A.Z. Ren, S. Veer, A. Majumdar, Generalization guarantees for imitation learning, in: Proceedings of the 2020 Conference on Robot Learning (CoRL), in: Proceedings of Machine Learning Research, PMLR, vol. 155, 2021, pp. 1426–1442.
[38]
Gurgen, A.E.; Majumdar, A.; Veer, S. (2021): Learning provably robust motion planners using funnel libraries. arXiv preprint arXiv:2111.08733.
[39]
Agarwal, A.; Veer, S.; Ren, A.Z.; Majumdar, A. (2021): Stronger generalization guarantees for robot learning by combining generative models and real-world data. arXiv preprint arXiv:2111.08761.
[40]
A. Farid, D. Snyder, A.Z. Ren, A. Majumdar, Failure prediction with statistical guarantees for vision-based robot control, in: Proceedings of the Robotics: Science and Systems (RSS), 2022.
[41]
B. Eysenbach, A. Gupta, J. Ibarz, S. Levine, Diversity is all you need: learning skills without a reward function, in: Proceedings of the International Conference on Learning Representations (ICLR), 2019.
[42]
F. Bonin-Font, A. Ortiz, G. Oliver, Visual navigation for mobile robots: a survey, J. Intell. Robot. Syst. 53 (2008) 263–296.
[43]
R. Sim, J.J. Little, Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2006, pp. 2082–2089,.
[44]
S. Thrun, A. Bücken, Integrating grid-based and topological maps for mobile robot navigation, in: Proceedings of the AAAI Conference on Artificial Intelligence, 1996, pp. 944–951.
[45]
S. Bansal, V. Tolani, S. Gupta, J. Malik, C. Tomlin, Combining optimal control and learning for visual navigation in novel environments, in: Proceedings of the 2020 Conference on Robot Learning (CoRL), in: Proceedings of Machine Learning Research, PMLR, vol. 100, 2020, pp. 420–429.
[46]
S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2616–2625.
[47]
C. Richter, N. Roy, Safe visual navigation via deep learning and novelty detection, in: Proceedings of Robotics: Science and Systems (RSS), Cambridge, Massachusetts, 2017,.
[48]
L. Wellhausen, R. Ranftl, M. Hutter, Safe robot navigation via multi-modal anomaly detection, IEEE Robot. Autom. Lett. 5 (2020) 1326–1333.
[49]
B. Lütjens, M. Everett, J.P. How, Safe reinforcement learning with model uncertainty estimates, in: Proceedings of the International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 8662–8668.
[50]
Kahn, G.; Villaflor, A.; Pong, V.; Abbeel, P.; Levine, S. (2017): Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182.
[51]
A. Bajcsy, S. Bansal, E. Bronstein, V. Tolani, C.J. Tomlin, An efficient reachability-based framework for provably safe autonomous navigation in unknown environments, in: Proceedings of the IEEE 58th Conference on Decision and Control (CDC), IEEE, 2019, pp. 1758–1765.
[52]
A. Li, S. Bansal, G. Giovanis, V. Tolani, C. Tomlin, M. Chen, Generating robust supervision for learning-based visual navigation using Hamilton-Jacobi reachability, in: Proceedings of the 2nd Conference on Learning for Dynamics and Control, in: Proceedings of Machine Learning Research, PMLR, vol. 120, 2020, pp. 500–510.
[53]
Ramos, F.; Possas, R.C.; Fox, D. (2019): Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. arXiv preprint arXiv:1906.01728.
[54]
Y. Chebotar, A. Handa, V. Makoviychuk, M. Macklin, J. Issac, N. Ratliff, D. Fox, Closing the sim-to-real loop: adapting simulation randomization with real world experience, in: 2019 International Conference on Robotics and Automation (ICRA), 2019.
[55]
Lim, V.; Huang, H.; Chen, L.Y.; Wang, J.; Ichnowski, J.; Seita, D.; Laskey, M.; Goldberg, K. (2021): Planar robot casting with real2sim2real self-supervised learning. arXiv preprint arXiv:2111.04814.
[56]
B. Mehta, M. Diaz, F. Golemo, C.J. Pal, L. Paull, Active domain randomization, in: Conference on Robot Learning, 2020, pp. 1162–1176.
[57]
F. Muratore, C. Eilers, M. Gienger, J. Peters, Data-efficient domain randomization with bayesian optimization, IEEE Robot. Autom. Lett. 6 (2021) 911–918.
[58]
M. Cutler, T.J. Walsh, J.P. How, Reinforcement learning with multi-fidelity simulators, in: Proceedings of the IEEE/RSJ International Conference on Robotics and Automation (ICRA), 2014.
[59]
G. Shafer, V. Vovk, A tutorial on conformal prediction, J. Mach. Learn. Res. 9 (2008).
[60]
T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft Actor-Critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: Proceedings of the 35th International Conference on Machine Learning, in: Proceedings of Machine Learning Research, PMLR, vol. 80, 2018, pp. 1861–1870.
[61]
M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, U. Topcu, Safe reinforcement learning via shielding, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI Press, 2018.
[62]
A. Jabri, K. Hsu, A. Gupta, B. Eysenbach, S. Levine, C. Finn, Unsupervised curricula for visual meta-reinforcement learning, Advances in Neural Information Processing Systems (NeurIPS), vol. 32, 2019, pp. 10519–10530.
[63]
S. Kumar, A. Kumar, S. Levine, C. Finn, One solution is not all you need: few-shot extrapolation via structured MaxEnt RL, Advances in Neural Information Processing Systems (NeurIPS), vol. 33, Curran Associates, Inc., 2020, pp. 8198–8210.
[64]
A. Sharma, S. Gu, S. Levine, V. Kumar, K. Hausman, Dynamics-aware unsupervised discovery of skills, in: Proceedings of the International Conference on Learning Representations (ICLR), 2020.
[65]
J. Langford, R. Caruana, (Not) bounding the true error, Advances in Neural Information Processing Systems (NeurIPS), vol. 14, MIT Press, 2002.
[66]
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, D. Quillen, Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, Int. J. Robot. Res. 37 (2018) 421–436.
[67]
Quartz (2019): Amazon - this company built one of the world's most efficient warehouses by embracing chaos. https://classic.qz.com/perfect-company-2/1172282/this-company-built-one-of-the-worlds-most-efficient-warehouses-by-embracing-chaos/.
[68]
FutureCar (2022): A look at how Waymo's self-driving test fleet safely traveled 2.7 million miles in San Francisco last year. https://www.futurecar.com/5158/A-Look-at-How-Waymos-Self-Driving-Test-Fleet-Safely-Traveled-2-7-Million-Miles-in-San-Francisco-Last-Year.
[69]
Ichnowski, J.; Chen, K.; Dharmarajan, K.; Adebola, S.; Danielczuk, M.; Mayoral-Vilches, V.; Zhan, H.; Xu, D.; Ghassemi, R.; Kubiatowicz, J.; et al. (2022): Fogros 2: an adaptive and extensible platform for cloud and fog robotics using ros 2. arXiv preprint arXiv:2205.09778.
[70]
B. Eysenbach, S. Gu, J. Ibarz, S. Levine, Leave no trace: learning to reset for safe and autonomous reinforcement learning, in: Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018, https://openreview.net/forum?id=S1vuO-bCW.
[71]
Z. Borsos, M. Mutny, A. Krause, Coresets via bilevel optimization for continual learning and streaming, Adv. Neural Inf. Process. Syst. 33 (2020) 14879–14890.
[72]
Guedj, B. (2019): A primer on PAC-bayesian learning. arXiv preprint arXiv:1901.05353.
[73]
S. Arora, R. Ge, B. Neyshabur, Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, in: International Conference on Machine Learning, PMLR, 2018, pp. 254–263.

Cited By

View all

Index Terms

  1. Sim-to-Lab-to-Real: Safe reinforcement learning with shielding and generalization guarantees
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Artificial Intelligence
        Artificial Intelligence  Volume 314, Issue C
        Jan 2023
        467 pages

        Publisher

        Elsevier Science Publishers Ltd.

        United Kingdom

        Publication History

        Published: 01 January 2023

        Author Tags

        1. Reinforcement learning
        2. Sim-to-Real transfer
        3. Safety analysis
        4. Generalization

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 24 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media