Abstract
In psychology, goal-setting theory, which has been studied by psychologists for over 35 years, reveals that goals play significant roles in incentive, action and performance for human beings. Based on this theory, a goal net model has been proposed to design intelligent agents that can be viewed as a soft copy of human being somehow. The goal net model has been successfully applied in many agents, specially, non-player-character agents in computer games. Such an agent selects the optimal solution in all possible solutions found by using a recursive algorithm. However, if a goal net is very complex, the time of selection could be too long for the agent to respond quickly when the agent needs to re-select a new solution against the world’s change. Moreover, in some dynamic environments, it is impossible to know the exact outcome of choosing a solution in advance, and so the possible solutions cannot be evaluated precisely. Thus, to address the problem, this paper applies learning algorithm into goal selection in dynamic environments. More specifically, we first develop a reorganization algorithm that can convert a goal net to its equivalent counterpart that a Q-learning algorithm can operate on; then, we define the key component of Q-learning, reward function, according to the feature of goal nets; and finally lots of experiments are conducted to show that, in dynamic environments, the agent with the learning algorithm significantly outperforms the one with the recursive searching algorithm. Therefore, our work suggests an agent model that can effectively be applied in dynamic time-sensitive domain, like computer games and the P2P systems of online movie watching.
Similar content being viewed by others
Notes
In this paper, the meaning of goal net is goal network. It is irrelevant to soccer goal nets.
We give out here the details of the algorithm for the sake of readers’ convenience.
The status variables could be more than gain and cost. However, they are beyond the scope of this paper, and it is worth discussing in future work.
References
Ahmed R, Karypis G (2012) Algorithms for mining the evolution of conserved relational states in dynamic networks. Knowl Inf Syst 33(3):603–630
Amyot D, Ghanavati S, Horkoff J, Mussbacher G, Peyton L, Yu E (2010) Evaluating goal models within the goal-oriented requirement language. Int J Intell Syst 25(8):841–877
Anchuri P, Zaki MJ, Barkol O, Bergman R, Felder Y, Golan S, Sityon A (2012) Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33(3):491–522
Baylor AL, Kim Y (2005) Simulating instructional roles through pedagogical agents. Int J Artif Intell Educ 15(2):95–115
Blashford-Snell V (2008) The cooking book. Dorling Kindersley, London, United Kingdom
Chang M, He M, Luo X (2010) Designing a successful adaptive agent for TAC Ad auction. In: Proceedings of the 19th European conference on artificial intelligence, Lisbon, Portugal, pp 587–592
Chang M, He M, Luo X (2011) AstonCAT-plus: an efficient specialist for the tac market design tournament. In: Proceedings of 22nd international joint conferences on, artificial intelligence, pp 146–151
Dubois D, Grabisch M, Modave F, Prade H (2000) Relating decision under uncertainty and multicriteria decision making models. Int J Intell Syst 15:967979
Erol K, Hendler J, Nau DS (1994) HTN planning: complexity and expressivity. In: Proceedings of the 12th national conference on artificial intelligence, Seattle, WA, USA, pp 1123–1128
Fikes RE, Nilsson NJ (1971) STRIPS: a new approach in the application of theorem proving to problem solving. Artif Intell 2:189–208
Fu Y, Zhu X, Li B (2012) A survey on instance selection for active learning. Knowl Inf Syst 1–35. doi:10.1007/s10115-012-0507-8
Haake M, Gulz A (2009) A look at the roles of look and roles in embodied pedagogical agents—a user preference perspective. Int J Artif Intell Educ 19(1):39–71
He M, Leung H, Jennings NR (2003) A fuzzy logic based bidding strategy for autonomous agents in continuous double auctions. IEEE Trans Knowl Data Eng 15(6):1345–1363
He M, Rogers A, Luo X, Jennings NR (2006) Designing a successful trading agent for supply chain management. In: Proceedings of the 5th international conference on autonomous agents and multi-agent systems, Hakodate, Japan, pp 1159–1166
Hogg C, Hector M-A, Kuter U (2008) HTN-MAKER: learning HTNs with minimal additional knowledge engineering required. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Illinois, Chicago, pp 950–956
Hogg C, Kuter U, Munoz-Avila H (2009) Learning hierarchical task networks for nondeterministic planning domains. In: Proceedings of the 21st international joint conference on artificial intelligence, Pasadena, California, USA, pp 1708–1714
Huang Z, Huang Q (2012) To reach consensus using uninorm aggregation operator: a gossip-based protocol. Int J Intell Syst 27:375–395
Ilghami O, Munoz-Avila H, Nau DS (2002) CaMeL: learning method preconditions for HTN planning. In: Proceedings of the 6th international conference on AI planning and scheduling, Toulouse, France, pp 131–141
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Kapp MN, Sabourin R, Maupin P (2011) A dynamic optimization approach for adaptive incremental learning. Int J Intell Syst 26:1101–1124
Lamsweerde AV (2001) Goal-oriented requirements engineering: a guided tour. In: Proceedings of the 5th IEEE international symposium on requirements engineering, Canada, Toronto, pp 249–263
Lovejoy W (2010a) Bargaining chains. Manage Sci 56(12):2282–2301
Lovejoy W (2010b) Conversations with supply chain managers. Technical report, Ross School of Business, University of Michigan, Working paper 1145
Luo X (2012) The evaluation of a knowledge based acquisition system of fuzzy tradeoff strategies for negotiating agents. In: Proceedings of the 14th annual international conference on electronic commerce, Singapore, pp 157–158
Luo X, Jennings NR (2007) A spectrum of compromise aggregation operators for multi-attribute decision making. Artif Intell 171(2–3):161–184
Luo X, Jennings NR, Shadbolt N (2006) Acquiring user tradeoff strategies and preferences for negotiating agents: a default-then-adjust method. Int J Man-Mach Stud 64(4):304–321
Luo X, Jennings NR, Shadbolt NR, Leung H-f, Lee JH-m (2003) A fuzzy constraint based model for bilateral, multi-issue negotiations in semi-competitive environments. Artif Intell 148(1–2):53–102
Luo X, Lee JH-m, Leung H-f, Jennings NR (2003) Prioritised fuzzy constraint satisfaction problems: axioms, instantiation and validation. Fuzzy Sets Syst 136(2):151–188
Luo X, Miao C, Jennings N, He M, Shen Z, Zhang M (2012) KEMNAD: a knowledge engineering methodology for negotiating agent development. Comput Intell 28(1):51–105
Ma W, Xiong W, Luo X (2013) A model for decision making with missing, imprecise, and uncertain evaluations of multiple criteria. Int J Intell Syst 28(2):152–184
McDermott D, Ghallab M, Howe A, Knoblock C, Ram A, Veloso M, Weld D, Wilkins D (1998) PDDL—the planning domain definition language. Technique report, Yale Center for Computational Vision and Control
Merigó JM, Casanovas M (2011) The uncertain induced quasi-arithmetic owa operator. Int J Intell Syst 26(1):1–24
Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Adapt Behav 17(1):7–27
Mitchell TM (1997) Machine learning. McGraw Hill, Burr Ridge
Nagurney A (2006) Supply chain network economics: dynamics of prices, flows, and profits. Edward Elgar Publishing, Cheltenham
Natarajan S, Tadepalli P, Fern A (2012) A relational hierarchical model for decision-theoretic assistance. Knowl Inf Syst 32(2):329–349
Nowaczyk S (2006) Learning of agents with limited resources. In: Proceedings of the 21st national conference on artificial intelligence, Boston, Massachusetts, pp 1893–1894
Pednault E (1989) ADL: exploring the middle ground between strips and the situation calculus. In: Proceedings of the 1st international conference on principles of knowledge representation and reasoning, Toronto, ON. Morgan Kaufmann, San Mateo, CA, pp 324–332
Rao AS, Georgeff M (1995) BDI agents: from theory to practice. In: Lesser VR, Gasser L (eds) Proceedings of the 1st international conference on multiagent systems, San Francisco, USA, pp 312–319
Sacerdoti E (1975) The non-linear nature of plans. In: Proceedings of the 4th international joint conference on artificial intelligence, Tbilisi, Georgia, pp 206–214
Shen Z (2005) Goal-oriented modeling for intelligent agents and their applications. PhD thesis, Nanyang Technological University
Shen Z, Li D, Miao C, Gay R (2005) Goal-oriented methodology for agent system development. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on intelligent agent technology, Compiegne, France, pp 95–101
Shen Z, Miao C, Gay R (2004) Goal oriented modeling for intelligent software agents. In: Proceedings of the 2004 IEEE/WIC/ACM international conference on intelligent agent technology, Beijing, China, pp 540–543
Shen Z, Miao C, Miao Y, Tao X, Gay R (2006) A goal-oriented approach to goal selection and action selection. In: Proceedings of the IEEE international conference on fuzzy systems, 2006. IEEE world congress on computational intelligence, Vancouver, BC, Canada, pp 114–121
Singh D, Sardina S, Padgham L, Airiau S (2010) Learning context conditions for BDI plan selection. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, Toronto, Canada, pp 325–332
Strobbe M, Van Laere O, Dhoedt B, De Turck F, Demeester P (2012) Hybrid reasoning technique for improving context-aware applications. Knowl Inf Syst 31(3):581–616
Sukthankar G, Sycara K (2008) Hypothesis pruning and ranking for large plan recognition problems. In: Proceedings of the 23rd national conference on artificial intelligence, vol 2. Chicago, Illinois, pp 998–1003
Warren JF (2003) Rickshaw coolie: a people’s history of Singapore, 1880–1940. NUS Press, Singapore
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Xiao Z, Chen W, Li L (2012) A method based on interval-valued fuzzy soft set for multi-attribute group decision-making problems under uncertain environment. Knowl Inf Syst 1–17. doi:10.1007/s10115-012-0496-7
Yager R (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybernet 18(1):183–190
Yager RR (2008) Decision making under dempster-shafer uncertainties. In: Yager RR, Liping L (eds) Classic works of the Dempster-Shafer theory of belief functions. Springer, Berlin, pp 619–632
Yager R, Rybalov A (1996) Uninorm aggregation operators. Fuzzy Sets Syst 80:111–120
Yang Q, Wu K, Jiang Y (2007) Learning action models from plan examples using weighted MAX-SAT export. Artif Intell 171(2–3):107–143
Zhang H, Shen Z, Miao C (2009) Enabling goal oriented action planning with goal net. In: Proceedings of the 2009 IEEE/WIC/ACM international conference on intelligent agent technology, Milano, Italy, pp 271–274
Acknowledgments
The authors would like to thank the anonymous reviewers for very helpful comments. This paper is partially supported by National Natural Science Foundation of China (No. 61173019), Bairen plan of Sun Yat-sen University, and major projects of the Ministry of Education China (No. 10JZD0006).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, H., Luo, X., Miao, C. et al. Adaptive goal selection for agents in dynamic environments. Knowl Inf Syst 37, 665–692 (2013). https://doi.org/10.1007/s10115-013-0645-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0645-7