Abstract
In this paper we propose an algorithm for multi-agent Q-learning. The algorithm is inspired by the natural behaviour of ants, which deposit pheromone in the environment to communicate. The benefit besides simulating ant behaviour in a colony is to design complex multi-agent systems. Complex behaviour can emerge from relatively simple interacting agents. The proposed Q-learning update equation includes a belief factor. The belief factor reflects the confidence the agent has in the pheromone detected in its environment. Agents communicate implicitly to co-ordinate and co-operate in learning to solve a problem.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Anderson, P.G. Blacwell, and C. Cannings. Simulating ants that forage by expectation. In Proc. 4Th Conf. on Artificial Life, pages 531–538, 1997.
R. Beckers, J. L. Deneubourg, S. Goss, and J. M. Pasteels. Collective decision making through food recruitment. Ins. Soc., 37:258–267, 1990.
R. Beckers, J.L. Deneubourg, and S. Goss. Trails and u-turns in the selection of the shortest path by the ant lasius niger. Journal of Theoretical Biology, 159:397–4151, 1992.
D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm intelligence, From Natural to Artificial Systems. Oxford University Press, 1999.
M. C. Cammaerts-Tricot. Piste et pheromone attraction chez la fourmi myrmica ruba. Journal of Computational Physiology, 88:373–382, 1974.
G. Di Caro and M. Dorigo. Antnet: a mobile agents approach to adaptive routing.
A. Colorni, M. Dorigo, and V. Maniezzo. Ant system for job-shop scheduling. Belgian Journal of OR, statistics and computer science, 34:39–53, 1993.
A. Colorni, M. Dorigo, and G. Theraulaz. Distributed optimzation by ant colonies. In Proceedings First European Conf. on Artificial Life, pages 134–142, 1991.
J.L. Deneubourg, R. Beckers, and S. Goss. Trails and u-turns in the selection of a path by the ant lasius niger. Journal of Theoretical Biology, 159:397–415, 1992.
J.L. Deneubourg and S. Goss. Collective patterns and decision making. Ethol. Ecol. and Evol., 1:295–311, 1993.
M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the travelling salesman problem. IEEE Trans. on Evol. Comp., 1:53–66, 1997.
M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: Optimization by a colony of cooperatin agents. IEEE Trans. on Systems, Man, and Cybernetics, 26:1–13, 1996.
M. Kisiel-Dorohinicki E. Nawarecki, G. Dobrowolski. Organisations in the particular class of multi-agent systems. In in this volume, 2001.
L. M. Gambardella and M. Dorigo. Ant-q: A reinforcement learning approach to the traveling salesman problem. In Proc. 12Th ICML, pages 252–260, 1995.
L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for the qap. Journal of Operational Research society, 1998.
S. Goss, S. Aron, J.L. Deneubourg, and J. M. Pasteels. Self-organized shorcuts in the argentine ants. Naturwissenschaften, pages 579–581, 1989.
L. R. Leerink, S. R. Schultz, and M. A. Jabri. A reinforcement learning exploration strategy based on ant foraging mechanisms. In Proc. 6Th Australian Conference on Neural Nets, 1995.
J-P. Sansonnet N. Sabouret. Learning collective behaviour from local interaction. In in this volume, 2001.
J.G. Ollason. Learning to forage-optimally? Theoretical Population Biology, 18:44–56, 1980.
J.G. Ollason. Learning to forage in a regenerating patchy environment: can it fail to be optimal? Theoretical Population Biology, 31:13–32, 1987.
H. Van Dyke Parunak and S. Brueckner. Ant-like missionnaries and cannibals: Synthetic pheromones for distributed motion control. In Proc. of ICMAS’00, 2000.
H. Van Dyke Parunak, S. Brueckner, J. Sauter, and J. Posdamer. Mechanisms and military applications for synthetic pheromones. In Proc. 5Th International Conference Autonomous Agents, Montreal, Canada, 2001.
L. Sheremetov R. Romero Cortes. Model of cooperation in multi-agent systems with fuzzy coalitions. In in this volume, 2001.
R. S. Sutton and A.G. Barto. Reinforcement Learning. MITPress, 1998.
Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, pages 330–337, 1993.
R. T. Vaughan, K. Stoy, G. S. Sukhatme, and M. J. Mataric. Whistling in the dark: Cooperative trail following in uncertain localization space. In Proc. 4Th International Conference on Autonomous Agents, Barcelona, Spain, 2000.
C. J. C. H. Watkins. Learning with delayed rewards. PhD thesis, University of Cambridge, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Monekosso, N., Remagnino, P., Szarowicz, A. (2002). An Improved Q-Learning Algorithm Using Synthetic Pheromones. In: Dunin-Keplicz, B., Nawarecki, E. (eds) From Theory to Practice in Multi-Agent Systems. CEEMAS 2001. Lecture Notes in Computer Science(), vol 2296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45941-3_21
Download citation
DOI: https://doi.org/10.1007/3-540-45941-3_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43370-5
Online ISBN: 978-3-540-45941-5
eBook Packages: Springer Book Archive