Abstract
Engineering approaches to machine learning (including robot learning) typically seek for the best learning algorithm for a particular problem, or a set problems. In contrast, the mammalian brain appears as a toolbox of different learning strategies, so that any newly encountered situation can be autonomously learned by an animal with a combination of existing learning strategies. For example, when facing a new navigation problem, a rat can either learn a map of the environment and then plan to find a path to its goal within this map. Alternatively, it can learn sequences of egocentric movements in response to identifiable features of the environment. For about 15 years, computational neuroscientists have searched for the mammalian brain’s coordination mechanisms which enable it to find efficient, if not necessarily optimal, combinations of existing learning strategies to solve new problems. Understanding such coordination principles of multiple learning strategies could have great implications in robotics, to enable robots to autonomously determine which learning strategies are appropriate in different contexts. Here, we review some of the main neuroscience models for the coordination of learning strategies and present some of the early results obtained when applying these models to robot learning. We moreover highlight important energy costs which can be reduced with such bio-inspired solutions compared to current deep reinforcement learning approaches. We conclude by sketching a roadmap for further developing such bio-inspired hybrid learning approaches to robotics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform Gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Trans. Neural Netw. 15(3), 639–652 (2004). https://doi.org/10.1109/TNN.2004.826221
Bellot, J., Sigaud, O., Khamassi, M.: Which temporal difference learning algorithm best reproduces dopamine activity in a multi-choice task? In: Ziemke, T., Balkenius, C., Hallam, J. (eds.) SAB 2012. LNCS (LNAI), vol. 7426, pp. 289–298. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33093-3_29
Bellot, J., Sigaud, O., Roesch, M.R., Schoenbaum, G., Girard, B., Khamassi, M.: Dopamine neurons activity in a multi-choice task: reward prediction error or value function? In: Proceedings of the French Computational Neuroscience NeuroComp12 Workshop, pp. 1–7 (2012)
Burgess, N., Maguire, E.A., O’Keefe, J.: The human hippocampus and spatial and episodic memory. Neuron 35(4), 625–641 (2002)
Caluwaerts, K., et al.: Neuro-inspired navigation strategies shifting for robots: integration of a multiple landmark taxon strategy. In: Prescott, T.J., Lepora, N.F., Mura, A., Verschure, P.F.M.J. (eds.) Living Machines 2012. LNCS (LNAI), vol. 7375, pp. 62–73. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31525-1_6
Caluwaerts, K., et al.: A biologically inspired meta-control navigation system for the Psikharpax rat robot. Bioinspiration Biomim. 7, 025009 (2012)
Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI, vol. 94, pp. 1023–1028 (1994)
Cazé, R., Khamassi, M., Aubin, L., Girard, B.: Hippocampal replays under the scrutiny of reinforcement learning models. J. Neurophysiol. 120(6), 2877–2896 (2018)
Chatila, R., et al.: Toward self-aware robots. Front. Robot. AI 5(1), 88–108 (2018)
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. arXiv preprint arXiv:1703.03078 (2017)
Coutureau, E., Killcross, S.: Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146(1–2), 167–174 (2003)
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P., Dolan, R.J.: Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6), 1204–1215 (2011)
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)
Dayan, P.: Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5(4), 613–624 (1993)
Decker, J.H., Otto, A.R., Daw, N.D., Hartley, C.A.: From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27(6), 848–858 (2016)
Dezfouli, A., Balleine, B.W.: Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35(7), 1036–1051 (2012)
Dickinson, A., Balleine, B.: Motivational control of goal-directed action. Anim. Learn. Behav. 22(1), 1–18 (1994)
Dollé, L., Chavarriaga, R., Guillot, A., Khamassi, M.: Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput. Biol. 14(4), e1006092 (2018)
Dollé, L., Khamassi, M., Girard, B., Guillot, A., Chavarriaga, R.: Analyzing interactions between navigation strategies using a computational model of action selection. In: Freksa, C., Newcombe, N.S., Gärdenfors, P., Wölfl, S. (eds.) Spatial Cognition 2008. LNCS (LNAI), vol. 5248, pp. 71–86. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87601-4_8
Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., Guillot, A.: Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol. Cybern. 103(4), 299–317 (2010)
Doncieux, S., et al.: Dream architecture: a developmental approach to open-ended learning in robotics. arXiv preprint arXiv:2005.06223 (2020)
Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)
Dromnelle, R., Girard, B., Renaudo, E., Chatila, R., Khamassi, M.: Coping with the variability in humans reward during simulated human-robot interactions through the coordination of multiple learning strategies. In: Proceedings of the 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2020), Naples, Italy (2020)
Dromnelle, R., Renaudo, E., Pourcel, G., Chatila, R., Girard, B., Khamassi, M.: How to reduce computation time while sparing performance during robot navigation? A neuro-inspired architecture for autonomous shifting between model-based and model-free learning. In: 9th International Conference on Biomimetic & Biohybrid Systems (Living Machines 2020). pp. 1–12. LNAI, Online Conference (Initially Planned in Freiburg, Germany) (2020)
Eichenbaum, H.: Prefrontal-hippocampal interactions in episodic memory. Nat. Rev. Neurosci. 18(9), 547–558 (2017)
Frankland, P.W., Bontempi, B.: The organization of recent and remote memories. Nat. Rev. Neurosci. 6(2), 119–130 (2005)
Gupta, A.S., van der Meer, M.A., Touretzky, D.S., Redish, A.D.: Hippocampal replay is not a simple function of experience. Neuron 65(5), 695–705 (2010)
Hafez, M.B., Weber, C., Kerzel, M., Wermter, S.: Curious meta-controller: adaptive alternation between model-based and model-free control in deep reinforcement learning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Hangl, S., Dunjko, V., Briegel, H.J., Piater, J.: Skill learning by autonomous robotic playing using active learning and creativity. arXiv preprint arXiv:1706.08560 (2017)
Jauffret, A., Cuperlier, N., Gaussier, P., Tarroux, P.: From self-assessment to frustration, a small step toward autonomy in robotic navigation. Front. Neurorobotics 7, 16 (2013)
Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M.: Building neural representations of habits. Science 286(5445), 1745–1749 (1999)
Johnson, A., Redish, A.D.: Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27(45), 12176–12189 (2007)
Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011)
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5), e1002055 (2011)
Khamassi, M.: Complementary roles of the rat prefrontal cortex and striatum in reward-based learning and shifting navigation strategies. Ph.D. thesis, Université Pierre et Marie Curie-Paris VI (2007)
Khamassi, M., Girard, B.: Modeling awake hippocampal reactivations with model-based bidirectional search. Biol. Cybern. (114), 231–248 (2020)
Khamassi, M., Humphries, M.D.: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front. Behav. Neurosci. 6, 79 (2012)
Khamassi, M., Velentzas, G., Tsitsimis, T., Tzafestas, C.: Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 10(4), 881–893 (2018)
Killcross, S., Coutureau, E.: Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13(4), 400–408 (2003)
Lee, S.W., Shimojo, S., O’Doherty, J.P.: Neural computations underlying arbitration between model-based and model-free learning. Neuron 81(3), 687–699 (2014)
Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E., Khamassi, M.: Modelling Individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comp. Biol. 10(2) (2014). https://doi.org/10.1371/journal.pcbi.1003466
Leutgeb, S., Leutgeb, J.K., Barnes, C.A., Moser, E.I., McNaughton, B.L., Moser, M.B.: Independent codes for spatial and episodic memory in hippocampal neuronal ensembles. Science 309(5734), 619–623 (2005)
Llofriu, M., et al.: A computational model for a multi-goal spatial navigation task inspired by rodent studies. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Maffei, G., Santos-Pata, D., Marcos, E., Sánchez-Fibla, M., Verschure, P.F.: An embodied biologically constrained model of foraging: from classical and operant conditioning to adaptive real-world behavior in DAC-X. Neural Netw. 72, 88–108 (2015)
Mattar, M.G., Daw, N.D.: Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21(11), 1609–1617 (2018)
McClelland, J.L., McNaughton, B.L., O’Reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)
Meyer, J.A., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The Psikharpax project: towards building an artificial rat. Robot. Auton. Syst. 50(4), 211–223 (2005)
Momennejad, I.: Learning structures: predictive representations, replay, and generalization. Curr. Opin. Behav. Sci. 32, 155–166 (2020)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)
Nakahara, H., Doya, K., Hikosaka, O.: Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences-a computational approach. J. Cogn. Neurosci. 13(5), 626–647 (2001)
O’Doherty, J.P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., Charpentier, C.J.: Why and how the brain weights contributions from a mixture of experts (2020)
O’keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Clarendon Press, Oxford (1978)
Ostlund, S.B., Balleine, B.W.: Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J. Neurosci. 25(34), 7763–7770 (2005)
Packard, M.G., Knowlton, B.J.: Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25(1), 563–593 (2002)
Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. Adapt. Behav. 1(4), 437–454 (1993)
Pezzulo, G., Rigoli, F., Chersi, F.: The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front. Psychol. 4, 92 (2013)
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Design of a control architecture for habit learning in robots. In: Duff, A., Lepora, N.F., Mura, A., Prescott, T.J., Verschure, P.F.M.J. (eds.) Living Machines 2014. LNCS (LNAI), vol. 8608, pp. 249–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09435-9_22
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Respective advantages and disadvantages of model-based and model-free reinforcement learning in a robotics neuro-inspired cognitive architecture. In: Biologically Inspired Cognitive Architectures BICA 2015, Lyon, France, pp. 178–184 (2015)
Renaudo, E., Girard, B., Chatila, R., Khamassi, M.: Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots? In: 5th International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EPIROB), Providence, RI, USA, pp. 254–260. (2015)
Rojas-Castro, D.M., Revel, A., Menard, M.: Rhizome architecture: an adaptive neurobehavioral control architecture for cognitive mobile robots’ application in a vision-based indoor robot navigation context. Int. J. Soc. Robot. (3), 1–30 (2020)
Ruvolo, P., Eaton, E.: ELLA: an efficient lifelong learning algorithm. In: International Conference on Machine Learning, pp. 507–515 (2013)
Santos-Pata, D., Zucca, R., Verschure, P.F.M.J.: Navigate the unknown: implications of grid-cells “Mental Travel” in vicarious trial and error. In: Lepora, N.F.F., Mura, A., Mangan, M., Verschure, P.F.M.J.F.M.J., Desmulliez, M., Prescott, T.J.J. (eds.) Living Machines 2016. LNCS (LNAI), vol. 9793, pp. 251–262. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42417-0_23
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Stachenfeld, K.L., Botvinick, M.M., Gershman, S.J.: The hippocampus as a predictive map. Nat. Neurosci. 20(11), 1643 (2017)
Stoianov, I., Maisto, D., Pezzulo, G.: The hippocampal formation as a hierarchical generative model supporting generative replay and continual learning. bioRxiv (2020)
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Thrun, S.: Lifelong learning algorithms. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 181–209. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_8
Viejo, G., Khamassi, M., Brovelli, A., Girard, B.: Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front. Behav. Neurosci. 9, 225 (2015)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Wiering, M.A., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Trans. Syst. Man Cybern. Part B 38(4), 930–936 (2008). https://doi.org/10.1109/TSMCB.2008.920231
Wise, S.P.: The role of the basal ganglia in procedural memory. In: Seminars in Neuroscience, vol. 8, pp. 39–46. Elsevier (1996)
Acknowledgements
The author would like to thank all his collaborators who have contributed through the years to this line of research. In particular, Andrea Brovelli, Romain Cazé, Ricardo Chavarriaga, Laurent Dollé, Benoît Girard, Agnes Guillot, Mark Humphries, Florian Lesaint, Olivier Sigaud, Guillaume Viejo for their contribution to the design, implementation, test, and analysis of computational models of the coordination of learning processes in humans and animals. And Rachid Alami, Lise Aubin, Ken Caluwaerts, Raja Chatila, Aurélie Clodic, Sandra Devin, Rémi Dromnelle, Antoine Favre-Félix, Benoît Girard, Christophe Grand, Agnes Guillot, Jean-Arcady Meyer, Steve N’Guyen, Guillaume Pourcel, Erwan Renaudo, Mariacarla Staffa for their contribution to the design, implementation, test and analysis of robotic experiments aimed at testing neuro-inspired principles for the coordination of learning processes.
Funding
This work has been funded by the Centre National de la Recherche Scientifique (CNRS)’s interdisciplinary programs (MITI) under the grant name ‘Hippocampal replay through the prism of reinforcement learning’.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Khamassi, M. (2020). Adaptive Coordination of Multiple Learning Strategies in Brains and Robots. In: Martín-Vide, C., Vega-Rodríguez, M.A., Yang, MS. (eds) Theory and Practice of Natural Computing. TPNC 2020. Lecture Notes in Computer Science(), vol 12494. Springer, Cham. https://doi.org/10.1007/978-3-030-63000-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-63000-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62999-1
Online ISBN: 978-3-030-63000-3
eBook Packages: Computer ScienceComputer Science (R0)