Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Adaptive Coordination of Multiple Learning Strategies in Brains and Robots

  • Conference paper
  • First Online:
Theory and Practice of Natural Computing (TPNC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12494))

  • 320 Accesses

Abstract

Engineering approaches to machine learning (including robot learning) typically seek for the best learning algorithm for a particular problem, or a set problems. In contrast, the mammalian brain appears as a toolbox of different learning strategies, so that any newly encountered situation can be autonomously learned by an animal with a combination of existing learning strategies. For example, when facing a new navigation problem, a rat can either learn a map of the environment and then plan to find a path to its goal within this map. Alternatively, it can learn sequences of egocentric movements in response to identifiable features of the environment. For about 15 years, computational neuroscientists have searched for the mammalian brain’s coordination mechanisms which enable it to find efficient, if not necessarily optimal, combinations of existing learning strategies to solve new problems. Understanding such coordination principles of multiple learning strategies could have great implications in robotics, to enable robots to autonomously determine which learning strategies are appropriate in different contexts. Here, we review some of the main neuroscience models for the coordination of learning strategies and present some of the early results obtained when applying these models to robot learning. We moreover highlight important energy costs which can be reduced with such bio-inspired solutions compared to current deep reinforcement learning approaches. We conclude by sketching a roadmap for further developing such bio-inspired hybrid learning approaches to robotics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform Gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Trans. Neural Netw. 15(3), 639–652 (2004). https://doi.org/10.1109/TNN.2004.826221

    Article  Google Scholar 

  2. Bellot, J., Sigaud, O., Khamassi, M.: Which temporal difference learning algorithm best reproduces dopamine activity in a multi-choice task? In: Ziemke, T., Balkenius, C., Hallam, J. (eds.) SAB 2012. LNCS (LNAI), vol. 7426, pp. 289–298. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33093-3_29

    Chapter  Google Scholar 

  3. Bellot, J., Sigaud, O., Roesch, M.R., Schoenbaum, G., Girard, B., Khamassi, M.: Dopamine neurons activity in a multi-choice task: reward prediction error or value function? In: Proceedings of the French Computational Neuroscience NeuroComp12 Workshop, pp. 1–7 (2012)

    Google Scholar 

  4. Burgess, N., Maguire, E.A., O’Keefe, J.: The human hippocampus and spatial and episodic memory. Neuron 35(4), 625–641 (2002)

    Article  Google Scholar 

  5. Caluwaerts, K., et al.: Neuro-inspired navigation strategies shifting for robots: integration of a multiple landmark taxon strategy. In: Prescott, T.J., Lepora, N.F., Mura, A., Verschure, P.F.M.J. (eds.) Living Machines 2012. LNCS (LNAI), vol. 7375, pp. 62–73. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31525-1_6

    Chapter  Google Scholar 

  6. Caluwaerts, K., et al.: A biologically inspired meta-control navigation system for the Psikharpax rat robot. Bioinspiration Biomim. 7, 025009 (2012)

    Article  Google Scholar 

  7. Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI, vol. 94, pp. 1023–1028 (1994)

    Google Scholar 

  8. Cazé, R., Khamassi, M., Aubin, L., Girard, B.: Hippocampal replays under the scrutiny of reinforcement learning models. J. Neurophysiol. 120(6), 2877–2896 (2018)

    Article  Google Scholar 

  9. Chatila, R., et al.: Toward self-aware robots. Front. Robot. AI 5(1), 88–108 (2018)

    Article  Google Scholar 

  10. Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. arXiv preprint arXiv:1703.03078 (2017)

  11. Coutureau, E., Killcross, S.: Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146(1–2), 167–174 (2003)

    Article  Google Scholar 

  12. Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P., Dolan, R.J.: Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6), 1204–1215 (2011)

    Article  Google Scholar 

  13. Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)

    Article  Google Scholar 

  14. Dayan, P.: Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5(4), 613–624 (1993)

    Article  Google Scholar 

  15. Decker, J.H., Otto, A.R., Daw, N.D., Hartley, C.A.: From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27(6), 848–858 (2016)

    Article  Google Scholar 

  16. Dezfouli, A., Balleine, B.W.: Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35(7), 1036–1051 (2012)

    Article  Google Scholar 

  17. Dickinson, A., Balleine, B.: Motivational control of goal-directed action. Anim. Learn. Behav. 22(1), 1–18 (1994)

    Article  Google Scholar 

  18. Dollé, L., Chavarriaga, R., Guillot, A., Khamassi, M.: Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput. Biol. 14(4), e1006092 (2018)

    Article  Google Scholar 

  19. Dollé, L., Khamassi, M., Girard, B., Guillot, A., Chavarriaga, R.: Analyzing interactions between navigation strategies using a computational model of action selection. In: Freksa, C., Newcombe, N.S., Gärdenfors, P., Wölfl, S. (eds.) Spatial Cognition 2008. LNCS (LNAI), vol. 5248, pp. 71–86. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87601-4_8

    Chapter  Google Scholar 

  20. Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., Guillot, A.: Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol. Cybern. 103(4), 299–317 (2010)

    Article  MATH  Google Scholar 

  21. Doncieux, S., et al.: Dream architecture: a developmental approach to open-ended learning in robotics. arXiv preprint arXiv:2005.06223 (2020)

  22. Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)

    Article  Google Scholar 

  23. Dromnelle, R., Girard, B., Renaudo, E., Chatila, R., Khamassi, M.: Coping with the variability in humans reward during simulated human-robot interactions through the coordination of multiple learning strategies. In: Proceedings of the 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2020), Naples, Italy (2020)

    Google Scholar 

  24. Dromnelle, R., Renaudo, E., Pourcel, G., Chatila, R., Girard, B., Khamassi, M.: How to reduce computation time while sparing performance during robot navigation? A neuro-inspired architecture for autonomous shifting between model-based and model-free learning. In: 9th International Conference on Biomimetic & Biohybrid Systems (Living Machines 2020). pp. 1–12. LNAI, Online Conference (Initially Planned in Freiburg, Germany) (2020)

    Google Scholar 

  25. Eichenbaum, H.: Prefrontal-hippocampal interactions in episodic memory. Nat. Rev. Neurosci. 18(9), 547–558 (2017)

    Article  Google Scholar 

  26. Frankland, P.W., Bontempi, B.: The organization of recent and remote memories. Nat. Rev. Neurosci. 6(2), 119–130 (2005)

    Article  Google Scholar 

  27. Gupta, A.S., van der Meer, M.A., Touretzky, D.S., Redish, A.D.: Hippocampal replay is not a simple function of experience. Neuron 65(5), 695–705 (2010)

    Article  Google Scholar 

  28. Hafez, M.B., Weber, C., Kerzel, M., Wermter, S.: Curious meta-controller: adaptive alternation between model-based and model-free control in deep reinforcement learning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  29. Hangl, S., Dunjko, V., Briegel, H.J., Piater, J.: Skill learning by autonomous robotic playing using active learning and creativity. arXiv preprint arXiv:1706.08560 (2017)

  30. Jauffret, A., Cuperlier, N., Gaussier, P., Tarroux, P.: From self-assessment to frustration, a small step toward autonomy in robotic navigation. Front. Neurorobotics 7, 16 (2013)

    Article  Google Scholar 

  31. Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M.: Building neural representations of habits. Science 286(5445), 1745–1749 (1999)

    Article  Google Scholar 

  32. Johnson, A., Redish, A.D.: Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27(45), 12176–12189 (2007)

    Article  Google Scholar 

  33. Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011)

    Google Scholar 

  34. Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5), e1002055 (2011)

    Article  MathSciNet  Google Scholar 

  35. Khamassi, M.: Complementary roles of the rat prefrontal cortex and striatum in reward-based learning and shifting navigation strategies. Ph.D. thesis, Université Pierre et Marie Curie-Paris VI (2007)

    Google Scholar 

  36. Khamassi, M., Girard, B.: Modeling awake hippocampal reactivations with model-based bidirectional search. Biol. Cybern. (114), 231–248 (2020)

    Google Scholar 

  37. Khamassi, M., Humphries, M.D.: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front. Behav. Neurosci. 6, 79 (2012)

    Article  Google Scholar 

  38. Khamassi, M., Velentzas, G., Tsitsimis, T., Tzafestas, C.: Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 10(4), 881–893 (2018)

    Article  Google Scholar 

  39. Killcross, S., Coutureau, E.: Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13(4), 400–408 (2003)

    Article  Google Scholar 

  40. Lee, S.W., Shimojo, S., O’Doherty, J.P.: Neural computations underlying arbitration between model-based and model-free learning. Neuron 81(3), 687–699 (2014)

    Article  Google Scholar 

  41. Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E., Khamassi, M.: Modelling Individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comp. Biol. 10(2) (2014). https://doi.org/10.1371/journal.pcbi.1003466

  42. Leutgeb, S., Leutgeb, J.K., Barnes, C.A., Moser, E.I., McNaughton, B.L., Moser, M.B.: Independent codes for spatial and episodic memory in hippocampal neuronal ensembles. Science 309(5734), 619–623 (2005)

    Article  Google Scholar 

  43. Llofriu, M., et al.: A computational model for a multi-goal spatial navigation task inspired by rodent studies. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  44. Maffei, G., Santos-Pata, D., Marcos, E., Sánchez-Fibla, M., Verschure, P.F.: An embodied biologically constrained model of foraging: from classical and operant conditioning to adaptive real-world behavior in DAC-X. Neural Netw. 72, 88–108 (2015)

    Article  Google Scholar 

  45. Mattar, M.G., Daw, N.D.: Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21(11), 1609–1617 (2018)

    Article  Google Scholar 

  46. McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)

    Article  Google Scholar 

  47. Meyer, J.A., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The Psikharpax project: towards building an artificial rat. Robot. Auton. Syst. 50(4), 211–223 (2005)

    Article  Google Scholar 

  48. Momennejad, I.: Learning structures: predictive representations, replay, and generalization. Curr. Opin. Behav. Sci. 32, 155–166 (2020)

    Article  Google Scholar 

  49. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)

    Google Scholar 

  50. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)

    Google Scholar 

  51. Nakahara, H., Doya, K., Hikosaka, O.: Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences-a computational approach. J. Cogn. Neurosci. 13(5), 626–647 (2001)

    Article  Google Scholar 

  52. O’Doherty, J.P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., Charpentier, C.J.: Why and how the brain weights contributions from a mixture of experts (2020)

    Google Scholar 

  53. O’keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Clarendon Press, Oxford (1978)

    Google Scholar 

  54. Ostlund, S.B., Balleine, B.W.: Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J. Neurosci. 25(34), 7763–7770 (2005)

    Article  Google Scholar 

  55. Packard, M.G., Knowlton, B.J.: Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25(1), 563–593 (2002)

    Article  Google Scholar 

  56. Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. Adapt. Behav. 1(4), 437–454 (1993)

    Article  Google Scholar 

  57. Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front. Psychol. 4, 92 (2013)

    Article  Google Scholar 

  58. Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Design of a control architecture for habit learning in robots. In: Duff, A., Lepora, N.F., Mura, A., Prescott, T.J., Verschure, P.F.M.J. (eds.) Living Machines 2014. LNCS (LNAI), vol. 8608, pp. 249–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09435-9_22

    Chapter  Google Scholar 

  59. Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Respective advantages and disadvantages of model-based and model-free reinforcement learning in a robotics neuro-inspired cognitive architecture. In: Biologically Inspired Cognitive Architectures BICA 2015, Lyon, France, pp. 178–184 (2015)

    Google Scholar 

  60. Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots? In: 5th International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EPIROB), Providence, RI, USA, pp. 254–260. (2015)

    Google Scholar 

  61. Rojas-Castro, D.M., Revel, A., Menard, M.: Rhizome architecture: an adaptive neurobehavioral control architecture for cognitive mobile robots’ application in a vision-based indoor robot navigation context. Int. J. Soc. Robot. (3), 1–30 (2020)

    Google Scholar 

  62. Ruvolo, P., Eaton, E.: ELLA: an efficient lifelong learning algorithm. In: International Conference on Machine Learning, pp. 507–515 (2013)

    Google Scholar 

  63. Santos-Pata, D., Zucca, R., Verschure, P.F.M.J.: Navigate the unknown: implications of grid-cells “Mental Travel” in vicarious trial and error. In: Lepora, N.F.F., Mura, A., Mangan, M., Verschure, P.F.M.J.F.M.J., Desmulliez, M., Prescott, T.J.J. (eds.) Living Machines 2016. LNCS (LNAI), vol. 9793, pp. 251–262. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42417-0_23

    Chapter  Google Scholar 

  64. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)

    Article  Google Scholar 

  65. Stachenfeld, K.L., Botvinick, M.M., Gershman, S.J.: The hippocampus as a predictive map. Nat. Neurosci. 20(11), 1643 (2017)

    Article  Google Scholar 

  66. Stoianov, I., Maisto, D., Pezzulo, G.: The hippocampal formation as a hierarchical generative model supporting generative replay and continual learning. bioRxiv (2020)

    Google Scholar 

  67. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  68. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  69. Thrun, S.: Lifelong learning algorithms. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 181–209. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_8

    Chapter  MATH  Google Scholar 

  70. Viejo, G., Khamassi, M., Brovelli, A., Girard, B.: Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front. Behav. Neurosci. 9, 225 (2015)

    Article  Google Scholar 

  71. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  72. Wiering, M.A., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B 38(4), 930–936 (2008). https://doi.org/10.1109/TSMCB.2008.920231

    Article  Google Scholar 

  73. Wise, S.P.: The role of the basal ganglia in procedural memory. In: Seminars in Neuroscience, vol. 8, pp. 39–46. Elsevier (1996)

    Google Scholar 

Download references

Acknowledgements

The author would like to thank all his collaborators who have contributed through the years to this line of research. In particular, Andrea Brovelli, Romain Cazé, Ricardo Chavarriaga, Laurent Dollé, Benoît Girard, Agnes Guillot, Mark Humphries, Florian Lesaint, Olivier Sigaud, Guillaume Viejo for their contribution to the design, implementation, test, and analysis of computational models of the coordination of learning processes in humans and animals. And Rachid Alami, Lise Aubin, Ken Caluwaerts, Raja Chatila, Aurélie Clodic, Sandra Devin, Rémi Dromnelle, Antoine Favre-Félix, Benoît Girard, Christophe Grand, Agnes Guillot, Jean-Arcady Meyer, Steve N’Guyen, Guillaume Pourcel, Erwan Renaudo, Mariacarla Staffa for their contribution to the design, implementation, test and analysis of robotic experiments aimed at testing neuro-inspired principles for the coordination of learning processes.

Funding

This work has been funded by the Centre National de la Recherche Scientifique (CNRS)’s interdisciplinary programs (MITI) under the grant name ‘Hippocampal replay through the prism of reinforcement learning’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Khamassi .

Editor information

Editors and Affiliations

Ethics declarations

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khamassi, M. (2020). Adaptive Coordination of Multiple Learning Strategies in Brains and Robots. In: Martín-Vide, C., Vega-Rodríguez, M.A., Yang, MS. (eds) Theory and Practice of Natural Computing. TPNC 2020. Lecture Notes in Computer Science(), vol 12494. Springer, Cham. https://doi.org/10.1007/978-3-030-63000-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63000-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62999-1

  • Online ISBN: 978-3-030-63000-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics