Adaptive Coordination of Multiple Learning Strategies in Brains and Robots

Khamassi, Mehdi

doi:10.1007/978-3-030-63000-3_1

Mehdi Khamassi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12494))

Included in the following conference series:

International Conference on the Theory and Practice of Natural Computing

320 Accesses

Abstract

Engineering approaches to machine learning (including robot learning) typically seek for the best learning algorithm for a particular problem, or a set problems. In contrast, the mammalian brain appears as a toolbox of different learning strategies, so that any newly encountered situation can be autonomously learned by an animal with a combination of existing learning strategies. For example, when facing a new navigation problem, a rat can either learn a map of the environment and then plan to find a path to its goal within this map. Alternatively, it can learn sequences of egocentric movements in response to identifiable features of the environment. For about 15 years, computational neuroscientists have searched for the mammalian brain’s coordination mechanisms which enable it to find efficient, if not necessarily optimal, combinations of existing learning strategies to solve new problems. Understanding such coordination principles of multiple learning strategies could have great implications in robotics, to enable robots to autonomously determine which learning strategies are appropriate in different contexts. Here, we review some of the main neuroscience models for the coordination of learning strategies and present some of the early results obtained when applying these models to robot learning. We moreover highlight important energy costs which can be reduced with such bio-inspired solutions compared to current deep reinforcement learning approaches. We conclude by sketching a roadmap for further developing such bio-inspired hybrid learning approaches to robotics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

How to Reduce Computation Time While Sparing Performance During Robot Navigation? A Neuro-Inspired Architecture for Autonomous Shifting Between Model-Based and Model-Free Learning

Autonomous robots: potential, advances and future direction

Article 06 September 2017

Robot Learning

References

Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform Gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Trans. Neural Netw. 15(3), 639–652 (2004). https://doi.org/10.1109/TNN.2004.826221
Article Google Scholar
Bellot, J., Sigaud, O., Khamassi, M.: Which temporal difference learning algorithm best reproduces dopamine activity in a multi-choice task? In: Ziemke, T., Balkenius, C., Hallam, J. (eds.) SAB 2012. LNCS (LNAI), vol. 7426, pp. 289–298. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33093-3_29
Chapter Google Scholar
Bellot, J., Sigaud, O., Roesch, M.R., Schoenbaum, G., Girard, B., Khamassi, M.: Dopamine neurons activity in a multi-choice task: reward prediction error or value function? In: Proceedings of the French Computational Neuroscience NeuroComp12 Workshop, pp. 1–7 (2012)
Google Scholar
Burgess, N., Maguire, E.A., O’Keefe, J.: The human hippocampus and spatial and episodic memory. Neuron 35(4), 625–641 (2002)
Article Google Scholar
Caluwaerts, K., et al.: Neuro-inspired navigation strategies shifting for robots: integration of a multiple landmark taxon strategy. In: Prescott, T.J., Lepora, N.F., Mura, A., Verschure, P.F.M.J. (eds.) Living Machines 2012. LNCS (LNAI), vol. 7375, pp. 62–73. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31525-1_6
Chapter Google Scholar
Caluwaerts, K., et al.: A biologically inspired meta-control navigation system for the Psikharpax rat robot. Bioinspiration Biomim. 7, 025009 (2012)
Article Google Scholar
Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI, vol. 94, pp. 1023–1028 (1994)
Google Scholar
Cazé, R., Khamassi, M., Aubin, L., Girard, B.: Hippocampal replays under the scrutiny of reinforcement learning models. J. Neurophysiol. 120(6), 2877–2896 (2018)
Article Google Scholar
Chatila, R., et al.: Toward self-aware robots. Front. Robot. AI 5(1), 88–108 (2018)
Article Google Scholar
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. arXiv preprint arXiv:1703.03078 (2017)
Coutureau, E., Killcross, S.: Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146(1–2), 167–174 (2003)
Article Google Scholar
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P., Dolan, R.J.: Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6), 1204–1215 (2011)
Article Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)
Article Google Scholar
Dayan, P.: Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5(4), 613–624 (1993)
Article Google Scholar
Decker, J.H., Otto, A.R., Daw, N.D., Hartley, C.A.: From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27(6), 848–858 (2016)
Article Google Scholar
Dezfouli, A., Balleine, B.W.: Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35(7), 1036–1051 (2012)
Article Google Scholar
Dickinson, A., Balleine, B.: Motivational control of goal-directed action. Anim. Learn. Behav. 22(1), 1–18 (1994)
Article Google Scholar
Dollé, L., Chavarriaga, R., Guillot, A., Khamassi, M.: Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput. Biol. 14(4), e1006092 (2018)
Article Google Scholar
Dollé, L., Khamassi, M., Girard, B., Guillot, A., Chavarriaga, R.: Analyzing interactions between navigation strategies using a computational model of action selection. In: Freksa, C., Newcombe, N.S., Gärdenfors, P., Wölfl, S. (eds.) Spatial Cognition 2008. LNCS (LNAI), vol. 5248, pp. 71–86. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87601-4_8
Chapter Google Scholar
Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., Guillot, A.: Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol. Cybern. 103(4), 299–317 (2010)
Article MATH Google Scholar
Doncieux, S., et al.: Dream architecture: a developmental approach to open-ended learning in robotics. arXiv preprint arXiv:2005.06223 (2020)
Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)
Article Google Scholar
Dromnelle, R., Girard, B., Renaudo, E., Chatila, R., Khamassi, M.: Coping with the variability in humans reward during simulated human-robot interactions through the coordination of multiple learning strategies. In: Proceedings of the 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2020), Naples, Italy (2020)
Google Scholar
Dromnelle, R., Renaudo, E., Pourcel, G., Chatila, R., Girard, B., Khamassi, M.: How to reduce computation time while sparing performance during robot navigation? A neuro-inspired architecture for autonomous shifting between model-based and model-free learning. In: 9th International Conference on Biomimetic & Biohybrid Systems (Living Machines 2020). pp. 1–12. LNAI, Online Conference (Initially Planned in Freiburg, Germany) (2020)
Google Scholar
Eichenbaum, H.: Prefrontal-hippocampal interactions in episodic memory. Nat. Rev. Neurosci. 18(9), 547–558 (2017)
Article Google Scholar
Frankland, P.W., Bontempi, B.: The organization of recent and remote memories. Nat. Rev. Neurosci. 6(2), 119–130 (2005)
Article Google Scholar
Gupta, A.S., van der Meer, M.A., Touretzky, D.S., Redish, A.D.: Hippocampal replay is not a simple function of experience. Neuron 65(5), 695–705 (2010)
Article Google Scholar
Hafez, M.B., Weber, C., Kerzel, M., Wermter, S.: Curious meta-controller: adaptive alternation between model-based and model-free control in deep reinforcement learning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Hangl, S., Dunjko, V., Briegel, H.J., Piater, J.: Skill learning by autonomous robotic playing using active learning and creativity. arXiv preprint arXiv:1706.08560 (2017)
Jauffret, A., Cuperlier, N., Gaussier, P., Tarroux, P.: From self-assessment to frustration, a small step toward autonomy in robotic navigation. Front. Neurorobotics 7, 16 (2013)
Article Google Scholar
Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M.: Building neural representations of habits. Science 286(5445), 1745–1749 (1999)
Article Google Scholar
Johnson, A., Redish, A.D.: Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27(45), 12176–12189 (2007)
Article Google Scholar
Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011)
Google Scholar
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5), e1002055 (2011)
Article MathSciNet Google Scholar
Khamassi, M.: Complementary roles of the rat prefrontal cortex and striatum in reward-based learning and shifting navigation strategies. Ph.D. thesis, Université Pierre et Marie Curie-Paris VI (2007)
Google Scholar
Khamassi, M., Girard, B.: Modeling awake hippocampal reactivations with model-based bidirectional search. Biol. Cybern. (114), 231–248 (2020)
Google Scholar
Khamassi, M., Humphries, M.D.: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front. Behav. Neurosci. 6, 79 (2012)
Article Google Scholar
Khamassi, M., Velentzas, G., Tsitsimis, T., Tzafestas, C.: Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 10(4), 881–893 (2018)
Article Google Scholar
Killcross, S., Coutureau, E.: Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13(4), 400–408 (2003)
Article Google Scholar
Lee, S.W., Shimojo, S., O’Doherty, J.P.: Neural computations underlying arbitration between model-based and model-free learning. Neuron 81(3), 687–699 (2014)
Article Google Scholar
Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E., Khamassi, M.: Modelling Individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comp. Biol. 10(2) (2014). https://doi.org/10.1371/journal.pcbi.1003466
Leutgeb, S., Leutgeb, J.K., Barnes, C.A., Moser, E.I., McNaughton, B.L., Moser, M.B.: Independent codes for spatial and episodic memory in hippocampal neuronal ensembles. Science 309(5734), 619–623 (2005)
Article Google Scholar
Llofriu, M., et al.: A computational model for a multi-goal spatial navigation task inspired by rodent studies. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Maffei, G., Santos-Pata, D., Marcos, E., Sánchez-Fibla, M., Verschure, P.F.: An embodied biologically constrained model of foraging: from classical and operant conditioning to adaptive real-world behavior in DAC-X. Neural Netw. 72, 88–108 (2015)
Article Google Scholar
Mattar, M.G., Daw, N.D.: Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21(11), 1609–1617 (2018)
Article Google Scholar
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)
Article Google Scholar
Meyer, J.A., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The Psikharpax project: towards building an artificial rat. Robot. Auton. Syst. 50(4), 211–223 (2005)
Article Google Scholar
Momennejad, I.: Learning structures: predictive representations, replay, and generalization. Curr. Opin. Behav. Sci. 32, 155–166 (2020)
Article Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)
Google Scholar
Nakahara, H., Doya, K., Hikosaka, O.: Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences-a computational approach. J. Cogn. Neurosci. 13(5), 626–647 (2001)
Article Google Scholar
O’Doherty, J.P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., Charpentier, C.J.: Why and how the brain weights contributions from a mixture of experts (2020)
Google Scholar
O’keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Clarendon Press, Oxford (1978)
Google Scholar
Ostlund, S.B., Balleine, B.W.: Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J. Neurosci. 25(34), 7763–7770 (2005)
Article Google Scholar
Packard, M.G., Knowlton, B.J.: Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25(1), 563–593 (2002)
Article Google Scholar
Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. Adapt. Behav. 1(4), 437–454 (1993)
Article Google Scholar
Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front. Psychol. 4, 92 (2013)
Article Google Scholar
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Design of a control architecture for habit learning in robots. In: Duff, A., Lepora, N.F., Mura, A., Prescott, T.J., Verschure, P.F.M.J. (eds.) Living Machines 2014. LNCS (LNAI), vol. 8608, pp. 249–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09435-9_22
Chapter Google Scholar
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Respective advantages and disadvantages of model-based and model-free reinforcement learning in a robotics neuro-inspired cognitive architecture. In: Biologically Inspired Cognitive Architectures BICA 2015, Lyon, France, pp. 178–184 (2015)
Google Scholar
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots? In: 5th International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EPIROB), Providence, RI, USA, pp. 254–260. (2015)
Google Scholar
Rojas-Castro, D.M., Revel, A., Menard, M.: Rhizome architecture: an adaptive neurobehavioral control architecture for cognitive mobile robots’ application in a vision-based indoor robot navigation context. Int. J. Soc. Robot. (3), 1–30 (2020)
Google Scholar
Ruvolo, P., Eaton, E.: ELLA: an efficient lifelong learning algorithm. In: International Conference on Machine Learning, pp. 507–515 (2013)
Google Scholar
Santos-Pata, D., Zucca, R., Verschure, P.F.M.J.: Navigate the unknown: implications of grid-cells “Mental Travel” in vicarious trial and error. In: Lepora, N.F.F., Mura, A., Mangan, M., Verschure, P.F.M.J.F.M.J., Desmulliez, M., Prescott, T.J.J. (eds.) Living Machines 2016. LNCS (LNAI), vol. 9793, pp. 251–262. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42417-0_23
Chapter Google Scholar
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Article Google Scholar
Stachenfeld, K.L., Botvinick, M.M., Gershman, S.J.: The hippocampus as a predictive map. Nat. Neurosci. 20(11), 1643 (2017)
Article Google Scholar
Stoianov, I., Maisto, D., Pezzulo, G.: The hippocampal formation as a hierarchical generative model supporting generative replay and continual learning. bioRxiv (2020)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
MATH Google Scholar
Thrun, S.: Lifelong learning algorithms. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 181–209. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_8
Chapter MATH Google Scholar
Viejo, G., Khamassi, M., Brovelli, A., Girard, B.: Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front. Behav. Neurosci. 9, 225 (2015)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Wiering, M.A., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B 38(4), 930–936 (2008). https://doi.org/10.1109/TSMCB.2008.920231
Article Google Scholar
Wise, S.P.: The role of the basal ganglia in procedural memory. In: Seminars in Neuroscience, vol. 8, pp. 39–46. Elsevier (1996)
Google Scholar

Download references

Acknowledgements

The author would like to thank all his collaborators who have contributed through the years to this line of research. In particular, Andrea Brovelli, Romain Cazé, Ricardo Chavarriaga, Laurent Dollé, Benoît Girard, Agnes Guillot, Mark Humphries, Florian Lesaint, Olivier Sigaud, Guillaume Viejo for their contribution to the design, implementation, test, and analysis of computational models of the coordination of learning processes in humans and animals. And Rachid Alami, Lise Aubin, Ken Caluwaerts, Raja Chatila, Aurélie Clodic, Sandra Devin, Rémi Dromnelle, Antoine Favre-Félix, Benoît Girard, Christophe Grand, Agnes Guillot, Jean-Arcady Meyer, Steve N’Guyen, Guillaume Pourcel, Erwan Renaudo, Mariacarla Staffa for their contribution to the design, implementation, test and analysis of robotic experiments aimed at testing neuro-inspired principles for the coordination of learning processes.

Funding

This work has been funded by the Centre National de la Recherche Scientifique (CNRS)’s interdisciplinary programs (MITI) under the grant name ‘Hippocampal replay through the prism of reinforcement learning’.

Author information

Authors and Affiliations

Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique (ISIR), 75005, Paris, France
Mehdi Khamassi

Authors

Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Khamassi .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Extremadura, Cáceres, Spain
Miguel A. Vega-Rodríguez
Chung Yuan Christian University, Taoyuan, Taiwan
Miin-Shen Yang

Ethics declarations

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khamassi, M. (2020). Adaptive Coordination of Multiple Learning Strategies in Brains and Robots. In: Martín-Vide, C., Vega-Rodríguez, M.A., Yang, MS. (eds) Theory and Practice of Natural Computing. TPNC 2020. Lecture Notes in Computer Science(), vol 12494. Springer, Cham. https://doi.org/10.1007/978-3-030-63000-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-63000-3_1
Published: 30 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62999-1
Online ISBN: 978-3-030-63000-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics