Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Int. j. inf. tecnol. https://doi.org/10.1007/s41870-017-0079-7 ORIGINAL RESEARCH Improved decision making in multiagent system for diagnostic application using cooperative learning algorithms Deepak A. Vidhate1 • Parag Kulkarni2 Received: 17 May 2017 / Accepted: 22 December 2017  Bharati Vidyapeeth’s Institute of Computer Applications and Management 2017 Abstract Cooperative nature in multiagent system inculcates more understanding and data by sharing the resources. So cooperation in a multiagent system gives higher efficiency and faster learning compared to that of single learning. However, there are some challenges in front of learning in a cooperative manner in the multiagent system that needs to pay attention. Making effective cooperative decisions that correctly and efficiently solve interacting problems requires agents to closely cooperate their actions during problem-solving. So various issues related with cooperative machine learning are implemented. Reinforcement learning is mainly implemented with game theory and robot applications. Paper gives the new approach for reinforcement learning methods applied to the diagnostic application. The novelty of the approach lies in the amalgamation two methods i.e. weighted strategy sharing with expertness parameter that enhances the learning performance. Weighted strategy method is implemented with Sarsa (k), Q(k) and Sarsa learning for cooperation between the agents that was not implemented previously. Cooperative learning model with individual and cooperative learning is given in this paper. Weighted Strategy Sharing algorithms calculate the weight of each Q table based upon expertness value. Variation of WSS method with Q-learning and Sarsa learning is implemented & Deepak A. Vidhate dvidhate@yahoo.com Parag Kulkarni parag.india@gmail.com 1 Department of Computer Engineering, College of Engineering, Shivajinagar, Pune, India 2 iKnowlation Research Labs Pvt. Ltd, Shivajinagar, Pune, India in this paper. The paper shows implementation results and performance comparison of Weighted Strategy Sharing with Q-learning, Q(k), Sarsa learning and Sarsa(k) algorithms. Keywords Cooperative learning  Q learning  Reinforcement learning  Sarsa learning  Weighted strategy sharing 1 Introduction In the application of retail market that has a number of shops all over state retailing many items or products to a large number of consumers. Each operation details i.e. customer ID, date items bought with the amount, the sum of money spent are noted down at the sale windows. It produces the huge bulk of data every day. The retailer needs to forecast who likely consumers for a particular item are. A simple algorithm is not sufficient for this prediction. There is need to analyze the accumulated data to convert into useful information that can be further useful for the item forecast. It is not known in advance which consumers are probable to buy this product, else another product [1]. It is understood that there is a process that gives details the data that is observed. But details of the underlying process are completely unknown. In the application of consumer behavior, it is not totally arbitrary [2]. Consumers are not purchasing the item arbitrary. When consumer purchases cold drinks, they may purchase chips; or consumer purchases ice cream in summer and hot tea in winter. There are fixed models in the data. It might not be possible to distinguish the entire procedure, but still, a good and useful estimation can be created. That estimate may not make clear everything, but may still be able to focus on 123 Int. j. inf. tecnol. some part of the data [3]. Though identifying the total procedure may not be possible, but still, model or regularities can be identified. Such models may help to identify with the process and make a forecast. Forecasting is useful keeping in view that the near upcoming will not be much different from the history and future forecast can also be likely to be right [4]. Many real-world applications are involved over one entity for an increase an outcome. In a situation of retail stores in which store A trade clothes, store B trade jewelry, store C trade footwear. In order to develop a single system to intelligent (certain aspects of) the marketing procedure, an internal of all shops A, B, C, and D can be estimated. The only feasible answer is to permit a mixture of shops to build their individual strategies that precisely characterize their objectives and benefits [5]. They must then be combined into the system with the aid of some of the techniques. The objective of each shop is to increase the revenue by maximizing sale i.e. yield maximization. Diverse factors are to be thought in this: the dependency of items, special discount policy, dynamic nature of seasons, concession, market situation etc. Different shops need to coordinate with one another for an increase in profit at various conditions. Numerous autonomous tasks that can be handled by individual agents could benefit from cooperative nature of agents [5, 6]. The novelty of the approach is clearly understood by three major contributions made by the paper: first, reinforcement learning methods are applied to the diagnostic application. In the literature, it is found that reinforcement learning is mainly implemented with game theory and robot applications. Secondly, it combines two methods i.e. weighted strategy sharing with expertness parameter to enhance the performance. Third, weighted strategy method is implemented with Sarsa (k), Q(k) and Sarsa learning for cooperation between the agents that was not implemented previously. The paper is ordered: Sect. 2 provides a concept about cooperative multiagent agent learning, Sect. 3 describes various Weighted Strategy Sharing methods using Q & Sarsa Learning. Section 4 gives experimental setup and Sect. 5 put up the result comparisons of all four algorithms i.e. Weighted Strategy Sharing using Q Learning, Weighted Strategy Sharing using Sarsa Learning, Weighted Strategy Sharing using Q(k) Learning and Weighted Strategy Sharing using Sarsa(k) Learning. Final concluding remark and future scope are mentioned in conclusion. 2 Cooperative learning algorithms Several multiagent learning systems are designed to pace up learning and/or to increase precision. Learning in MASs can be seen from different points of view. In the simplest 123 form, each agent just learns individually, without any attention to others and without any cooperation [7]. But, in a multiagent system, an agent’s knowledge may have a positive effect on other agents’ learning and knowledge acquisition. Therefore, cooperative learning may result in higher efficiency [8]. For different people or organizations with different goals and information, an interaction can be done using a multiagent system. These agents may cooperate to achieve the common goal by sharing their expertise/knowledge. Such teamwork would lead to improved performance. Cooperative nature of multiagent system inculcates more understanding and information by sharing the resources. So teamwork in a multiagent system gives higher efficiency and faster learning compared to that of single learning. Hence working in a cooperative team has significant advantages [8, 9]. Making effective cooperative decisions that correctly and efficiently solve interacting problems requires agents to closely cooperate their actions during problem-solving. So various issues related with cooperative machine learning are studied and implemented. Agents in a multi-agent scheme are trained from all of the agents in strategy-sharing algorithms. Q learning algorithm of Reinforcement Learning is responsible for independent agent learning [10]. Q tables of other agents are compiled together by an agent to calculate their new policy as the average of Q tables. The agents do not contain the capability to explore expertise agents and information of all of the agents is uniformly utilized. The only average of the Q tables is not useful if they contain dissimilar ability with expertise. After every cooperation step, The Q-tables of the agents turns out to be equivalent. Agents’ flexibility to changing situation is decreased due to this [11]. A strategy-sharing method depends on expertise detection is proposed to overcome these limitations. In this, the agents allocate some weight to the other agents’ Q-tables is used [11, 12]. 3 Weighted strategy sharing methods In previous work on multiagent learning, coordination between agents is one way among different agents. All of the agents can discover somewhat from one another; even from the non-expertise agents also in the real world applications. Weighted strategy sharing for cooperative learning is implemented. Each agent allocates a weight to the information of another agent so as to use it depending on their expertise [13]. Int. j. inf. tecnol. 3.1 Weighted strategy sharing using Q learning It is understood that elements of a collection of n uniform agents are training in some situation in WSS method. Agents actions do not modify the others learning situation. Two modes of learning are used i.e.: Independent Learning and Cooperative Learning Mode. Initially, all agents are in the Independent Learning form. Agent i trains ti learning. Training experiment begins from an arbitrary state and stops when the agent arrives at destination state. All agents stop the Individual Learning mode at the time when a particular amount of independent experiments are performed (that is referred as coordination time) [14]. Then an agent switches to Cooperative Learning form. Each learner allocates some weight to another agent as per their expertise values in the Cooperative Learning form. Then, it calculates a weighted mean of the other agents’ Q tables. Resultant Q table is found out that is used as its new Q table [15, 16]. 3.1.1 Independent learning based on Q-learning The agent obtains reinforcement after completion of every action. The Q-table (state-action), that calculates the longterm discounted reinforcement for every state/action couple to determine the trained strategy of the agent [14–16]. In Q-learning, action selection by an agent with the probability P given is by: eQðx;ai Þ=t Pðai jxÞ ¼ P Qðx;a Þ=t ; k e ð1Þ where t regulate the arbitrariness of the choice. The agent carries out the action obtained an instantaneous reward r, shift to the subsequent state y and modify Q as: Qnew i ðxi ; ai Þ :¼ ð1 calculated in order that the team training competence is increased. This principle assigns extra weight to such agents who have acquired more rewards and fewer punishments [17]. This is represented as a total of the reward signals n X eNrm ri ðtÞ ð3Þ ¼ i i¼1 where ri(t) is the amount of reinforcement signal that environment gives to agent i in step t. It is referred as Normal expertness as it is the just algebraic sum of rewards [18]. 3.1.3 Weight-assigning mechanism Q-tables of more expert agents is used by the learner to reduce the amount of coordination necessary to swap Qtables. Hence, fractional weights of the inexpert agents are treated as zero. Learner, i assign the weight to the understanding of agent j as: 8 > if i ¼ j < e1j aeii Wij ¼ P if ej \ ei ; ð4Þ > ek ei : otherwise 0 where 0 \ a \ 1 is the impressibility parameter to prove agent i depend on another agents information. ei and ej are the expertise value of agents i and j, and n is the total number of the agents. Weights assigned by each agent to other agent’s information are shown in Fig. 1. w12 is the weight assign by agent 1 to the information of agent 2 and w21 is the weight assign by agent 2 to the information of agent 1 [18, 19]. bi ÞQOld i ðxi ; ai Þ þ bi ðri þ ci Vðyi ÞÞ ð2Þ where b is the learning rate, (c \ 0 c \ 1) is a discount parameter. Q is enhanced steadily and the agent learns when it explores the state space. 3.1.2 Measuring the expertness values Expertise is defined as the ‘‘personification of knowledge and skills within persons’’. In human societies, it is observed that a learner assesses the others’ understanding with respect to their expertise. Each learner attempts to the best assessment technique to find out how much the others’ understanding is trustworthy. In this WSS Method also, the weight of each agent’s understanding is carefully Fig. 1 Cooperative learning by weighted strategy sharing 123 Int. j. inf. tecnol. Algorithm 1: WSS algorithm for agent ai 1. if InIndependentLearning Mode then 2. si := GetCurrentState() 3. ai := ActionSelection() 4. Action(ai) 5. ri := Reward() 6. yi := GoToNextState() 7. v(yi) := Maxb ∈ Q(yi,b) 8. Qinew(si,ai) := (1-βi)QiOld(si,ai)+βi(ri +γiV(yi)) 9. ei := UpdateExpertness(ri) 10. else {Cooperative Learning} 11. Loop j := 1 to n 12. calculate normal expertise as ejNrm := ∑ =1 ( ) 13. Qinew := 0 14. Loop j := 1 to n 15. Wij := ComputerWeights(i,j,e1…...en) 16. Qjold := GetQ(Aj) 17. Qinew := Qinew + Wij * Qjold Variations in the Weighted Strategy Sharing method has been introduced by replacing Q learning under Independent Learning by Sarsa learning, Q(k) learning and Sarsa(k) learning algorithms [10, 11]. Results of each algorithm have been found and compared. 3.2 Weighted strategy sharing using Sarsa learning Weighted strategy sharing (WSS) using Sara for cooperative learning is implemented. Every agent allocates a weight to their information and utilizes it depending on the amount of its teammate expertise in WSS method [20] Algorithm 2 : WSS using Sarsa learning 1. if InIndependentLearning Mode then 2. si := GetCurrentState() 3. ai := ActionSelection() 4. Action(ai) 5. ri := Reward() 6. Qinew (si,ai) := (1-βi)QiOld(si,ai) + βi(ri+γi Qinew(si,ai) – QiOld (si,ai)) 7. si := FindNextState() 8. ai := NewAction() 9. ei := UpdateExpertness(ri) 10. else {Cooperative Learning} 11. Loop j := 1 to n 12. calculate normal expertise as ejNrm := ∑ =1 ( ) 13. Qinew := 0 14. Loop j := 1 to n 15. Wij := ComputerWeights(i,j,e1…...en) 16. Qjold := GetQ(Aj) 17. Qinew := Qinew + Wij * Qjold 3.2.1 Independent learning based Sarsa learning Sarsa learning is used for the Independent Learning Mode. Sarsa is an on policy version of Q-learning where policy is used to determine also the next action. The on policy Sarsa 123 make use of the strategy resulting from Q values to decide subsequent action a and utilizes its Q value to determine the temporal difference. It is not waiting for all possible action to select the best. On policy, methods calculate the value of a policy while by means of it to get actions. They estimated Q value, the action values for current strategy, and then build up strategy slowly depend upon rough values for the present strategy. The plan enhancement is carried out in the easiest way using e-greedy strategy with reference to current action value estimation. Sarsa learning algorithm is used for this purpose [19, 20]. 3.3 Weighted strategy sharing using Q(k) learning Weighted strategy sharing (WSS) using Q(k) for cooperative learning is implemented. Every agent allocates a weight to their information and utilizes it depending on the amount of its teammate expertise in WSS method [21] Algorithm 3 : WSS using Q(λ) learning 1. if InIndependentLearning Mode then 2. si := GetCurrentState() 3. ai := ActionSelection() 4. Action(ai) 5. ri := Reward() 6. a* argmaxb Q(si, ai) 7. δ r + γQ(s’i, a*) – Q(si, ai) 8. e(si, ai) e(si, ai) + 1 9. for all si, ai 10. Q(si, a) Q(si, ai) + αδe(si, ai) 11. If ai’ = a* then e(si, ai) γλe(si, ai) else e(si, ai) 0 12. si := FindNextState(), ai := NewAction() 13. ei := UpdateExpertness(ri) 14. else {Cooperative Learning} 15. Loop j := 1 to n 16. calculate normal expertness as ejNrm := ∑ =1 ( ) 17. Qinew := 0 18. Loop j := 1 to n 19. Wij := ComputerWeights(i,j,e1…...en) 20. Qjold := GetQ(Aj) 21. Qinew := Qinew + Wij * Qjold 3.3.1 Independent learning based Q(k) learning Q(k) does not consider the end of the event in its support. It only considers next exploratory action. Q(k) looks one action previous the first searching using its awareness of the action values. The trace revise is considered as happening in two stages [20, 21]. Int. j. inf. tecnol. 3.4 Weighted strategy sharing using Sarsa(k) Learning the state of the environment. The final result is to increase the revenue by increasing total retailing of products. Weighted strategy sharing (WSS) using Sarsa (k) for cooperative learning is implemented. Every agent allocates a weight to their information and utilizes it depending on the amount of its teammate expertise in WSS method. 4.1 Input dataset Algorithm 4 : WSS using Sarsa(λ) learning 1. if InIndependentLearning Mode then 2. si := GetCurrentState() 3. ai := ActionSelection() 4. Action(ai) 5. ri := Reward() 6. δ r + γQ(s’i, a*) – Q(si, ai) 7. e(si, ai) e(si, ai) + 1 8. for all si, ai 9. Q(si, a) Q(si, ai) + αδe(si, ai) 10. e(si, ai) γλe(si, ai) 11. si := FindNextState() 12. ai := NewAction() 13. ei := UpdateExpertness(ri) 14. else {Cooperative Learning} 15. Loop j := 1 to n do 16. calculate normal expertness as ejNrm := ∑ =1 ( ) 17. Qinew := 0 18. Loop j := 1 to n do 19. Wij := ComputerWeights(i,j,e1…...en) 20. Qjold := GetQ(Aj) 21. Qinew := Qinew + Wij * Qjold 3.4.1 Independent learning based Sarsa (k) learning The eligibility trace version of Sarsa is called as Sarsa(k) [22]. The trace for state action pair of x, y is denoted by et(x, y); substituting state action variables for state variables the equation becomes Qtþ1 ¼ Qt ðx; yÞ þ adt et ðx; yÞ for all x; ywhere  dt ¼ rtþ1 þ cQt xtþ1 ; ytþ1 Qtðxt ; yt Þ The action set is defined as the retailing of probable items. i.e. A ¼ fp1; p2; p3. . .. . .p10g. Hence action a [A. State of the system is a line of consumers in given period for given store agent [24]. State can be defined as  SðtÞ ¼ s1ðtÞ ; s2ðtÞ ; n where s1 ? {Y, M, O} is the consumer queue with i.e. young age, middle age and old age consumer, s2 ? {H, M, L} is the Highest price, Medium price, Lowest price, n ? {1, 2, 3, 4………..12} is the month of item sale. In a system minimum, 108 states and actions are possible. A number of state-action increases as the number of transactions increases. For simplicity, it is assumed that single state for each transaction else the state space becomes infinitely large [23, 24]. Shop agent observes the queue and decides product i.e. action for each customer/ state. After every sale reward is given to the agent. The Table 1 shows the snapshot of the dataset generated for single shop agent. In a particular season, the sale of one shop increases. With the help of cooperative learning, other shops learn about the increase in the sale and they can take necessary actions for their profit maximization. 4.2 State and action selection Action selection mechanism in Q learning is accountable for choosing the actions such as the agent would carry out throughout the training procedure [23–25].Let s ¼ fs1 ; s2 . . .si g be one of these vectors, then the probability si of selecting action i is given by and et ðx; yÞ ¼ cket 1 ðx; yÞ þ 1 ¼ cket 1 ðx; yÞ if x ¼ xt andy ¼ yt otherwise The Sarsa (k) trace method strengths many actions of the sequence [23]. 4 Experimental setup Maximize the sale of products that depend on the price of the product, customer age and period of sale. These are the information available to each agent i.e. shop. So it becomes Table 1 Dataset Transaction ID Age Price Month Action selected (product) 1 2 Y L 1 P1, P2, P4 Y M 1 P2, P3 3 4 Y M H L 1 1 P3, P4 P1, P2 5 M M 1 P1, P2, P3 6 M H 1 P4, P2 7 O L 1 P1, P3 123 Int. j. inf. tecnol. si ¼ ð1 eÞ þ ðe=mÞ if Q of i is maximum ¼ e=m otherwise where m is the number of actions in the set. One way to assign such probabilities is , X Q0ðx;yiÞ Pðxi =yÞ ¼ C CQ0ðx;yjÞ ; j: P(xi/y) is the action selection probability yi, x is the current state, C is the constant [ 0. The high value of C assigns high probabilities to action i.e. maximum reward and the small value of K assign higher probabilities to other action i.e. minimum reward [25]. 5 Results Weighted strategy sharing algorithms i.e. one step Q learning, Q(k) learning, Sarsa learning and Sarsa (k) learning are compared using two parameters reward vs episodes as shown in Fig. 2. Sarsa (k) learning gives highest rewards compared to other three methods due to the addition of eligibility traces. However, it’s graph is fluctuating in nature. Sarsa learning gives second highest reward values and smoothly decreases rewards as an increase in a number of episodes. Q(k) learning receives low rewards compared to Sarsa & Sarsa (k) learning. There Fig. 2 Graph of reward Vs episode for four algorithms 123 is a huge difference between the rewards received by Q(k) learning (2500–3000) and Sarsa (k) learning (7000–8000). One step Q learning receives lowest rewards and numbers of rewards decreases as an increase in a number of episodes and after some episodes (50) it remains constants. Weighted strategy sharing algorithms i.e. one step Q learning, Q(k) learning, Sarsa learning and Sarsa (k) learning are compared using two parameters reward vs learning rate as shown in Fig. 3. Sarsa (k) learning gives highest rewards compared to other three methods due to the addition of eligibility traces. Increase in learning rate steadily increases the number of rewards received by the agent. Sarsa learning gives second highest reward values and nature is not fixed. At learning rate 0.3, a number of rewards received drop suddenly and after that number of rewards increases. Q(k) learning receives comparable rewards compared to Sarsa & Sarsa (k) learning At learning rate 0.3, rewards received by Q(k) and Sarsa learning are same. The difference is not much more between the rewards received by Q(k) learning, Sarsa learning, and Sarsa (k) learning. One step Q learning receives lowest rewards within the range of 500. After learning rate 0.4 one step Q learning has some increase in rewards. Weighted strategy sharing algorithms i.e. one step Q learning, Q(k) learning, Sarsa learning and Sarsa (k) learning are compared using two parameters reward vs Int. j. inf. tecnol. Fig. 3 Graph of reward Vs learning rate for four algorithms discount rate as shown in Fig. 4. Sarsa (k) learning gives highest rewards compared to other three methods due to the addition of eligibility traces. However, it’s graph is fluctuating in nature. Sarsa learning gives second highest reward values and smoothly increases rewards as an increase in discount rate. Q(k) learning receives moderate Fig. 4 Graph of reward Vs discount rate for four algorithms 123 Int. j. inf. tecnol. rewards compared to Sarsa & Sarsa (k) learning. There is a large difference between the rewards received by Q(k) learning (5000–6000) and one step Q learning (1000–1500). It has demonstrated that a shop agent can successfully make use of reinforcement learning in selecting items dynamically to increase its profit matrix. It is believed that this is a promising approach for profit maximization in retail market environments with limited information. In cooperative learning with Weighted Strategy Sharing algorithm, two agents use one another’s knowledge and action set. After learning cooperatively from each other each one receives its Q table. Significant improvement is seen in the results compared to multiagent learning as agents receive more knowledge. Both agents are enhancing the sale of products to increase the revenue by learning cooperatively. Above graphs demonstrate the performance of Weighted Strategy Sharing algorithms with Normal Expertness for rewards with reference to three parameters discount rate, learning rate and a number of episodes. Weighted strategy sharing with normal expertise outperforms by implementing Sarsa (k), Sarsa and Q(k) learning as compared to one step Q learning. It receives maximum rewards for this three algorithm. Profit calculated by each shop agent directly depends on the rewards received by that agent. three shop agents can obtain the maximum profit by following Sarsa (k), Sarsa and Q(k) learning. In other words, cooperation based on normal expertness gives more benefit in terms of profit for three shop agents. The results obtained by the proposed cooperation methods show that such methods can put into a quick convergence of agents in the dynamic environment. It also shows that cooperative methods give a good presentation in dense, incompletely and composite circumstances. 6 Conclusion Cooperative learning algorithms are more efficient and effective and produce best results. Learning algorithms are best suitable for decision making. In cooperative learning, sharing of more knowledge and information is possible, all agents’ knowledge is used equally, jointly solves the problem. The performance of cooperative learning algorithms is improved as compared to multiagent learning approach. Reinforcement learning is mainly implemented with game theory and robot applications. Paper gives the approach for reinforcement learning methods applied to the diagnostic application. Combination two methods i.e. weighted strategy sharing with expertness parameter certainly enhances the performance of learning. Weighted strategy method is implemented with Sarsa (k), Q(k) and Sarsa learning for cooperation between the agents that was 123 not implemented previously. However, these methods are still unable to find a more expert agent as it calculates expertise value only using the algebraic sum of the reinforcement signals. Hence, the future scope of this paper shall be emphasized on enhancing the cooperative learning algorithms for decision making using with different expertise measures. References 1. Vidhate DA, Kulkarni P (2017) ‘‘A Framework for Improved Cooperative Learning Algorithms with Expertness (ICLAE)’’, International Conference on advanced computing and communication technologies advances in intelligent systems and computing, 562nd edn. Springer, Singapore, pp 149–160 2. Vidhate DA, Kulkarni P (2017) ‘‘Expertise Based Cooperative Reinforcement Learning Methods (ECRLM)’’, International Conference on Information & Communication Technology for Intelligent System, Springer book series Smart Innovation, Systems and Technologies (SIST, volume 84). Springer, Cham, pp 350–360 3. Park K-H, Kim Y-J (2015) Modular Q-learning based multi-agent cooperation for robot soccer. Robot Auton Syst 35:3026–3033 4. M Camara, O Bonham-Carter, J Jumadinova (2015) A multiagent system with reinforcement learning agents for biomedical text mining. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, BCB ‘15. NY, USA, ACM pp 634–643 5. Vidhate DA, Kulkarni P (2016) ‘‘Innovative Approach Towards Cooperation Models for Multi-agent Reinforcement Learning (CMMARL)’’ Springer Nature series of communications in computer and information. Science 628:468–478 6. H. Iima, Y Kuroe (2015) Swarm reinforcement learning methods improving certainty of learning for a multi-robot formation problem. CEC pp 3026–3033 7. DA Vidhate, P Kulkarni (2017) ‘‘Enhanced Cooperative Multiagent Learning Algorithms (ECMLA) using Reinforcement Learning’’ International Conference on Computing, Analytics and Security Trends (CAST), IEEE Xplorer, pp 556–561 8. Al-Khatib AM (2011) Cooperative machine learning method. World Comput Sci Inform Technol J 1(9):380–383 (ISSN: 2221–0741) 9. Araabi Babak Nadjar, Mastoureshgh Sahar, Ahmadabadi Majid Nili (2010) A Study on Expertise of Agents and Its Effects on Cooperative Q-Learning. IEEE Transact Evol Comput 14:23–57 10. DA Vidhate, P Kulkarni (2016) ‘‘New Approach for Advanced Cooperative Learning Algorithms using RL Methods (ACLA)’’ Proceedings of the Third International Symposium on Computer Vision and the Internet, ACM, pp 12–20 11. Dr. Hamid R. Berenji David Vengerov (2000) ‘‘Learning, Cooperation, and Coordination in Multi-Agent Systems’’, In Proceedings of Ninth IEEE International Conference on Fuzzy Systems 12. EM de Cote, A Lazaric, M. Restelli (2006) Learning to cooperate in multi-agent social dilemmas. Autonomous Agents & MultiAgent System, pp. 783–785 13. DA Vidhate, P Kulkarni (2016) ‘‘Performance enhancement of cooperative learning algorithms by improved decision making for context-based application’’, International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT) IEEE Xplorer, pp 246-252, Int. j. inf. tecnol. 14. Jun-Yuan Tao, De-Sheng Li ‘‘Cooperative Strategy Learning In Multi-Agent Environment With Continuous State Space’’, IEEE International Conference on Machine Learning and Cybernetics, 2006 15. Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. J Auton Agents Multi-Agent Syst 11(3):387–434 16. Vidhate DA, Kulkarni P (2017) Multi-agent cooperation models by reinforcement learning (MCMRL). Int J Comput Appl 176(1):25–29 17. Nagendra Prasad MV, Lesser VR (1999) Learning situationspecific coordination in cooperative multi-agent systems. J Auton Agents Multi-Agent Syst 2(2):173–207 18. Vidhate DA, Kulkarni P (2016) Enhancement in decision making with improved performance by multiagent learning algorithms. IOSR J Comput Eng 1(18):18–25 19. Vidhate DA, Kulkarni P (2016) Single agent learning algorithms for decision making in diagnostic applications. SSRG Int J Comput Sci Eng 3(5):2348–8387 20. Z Abbasi, MA Abbasi (2002) ‘‘Reinforcement Distribution in a Team of Cooperative Q-learning Agent’’, Proceedings of the Ninth ACIS International Conference on Software Engineering, 21. 22. 23. 24. 25. Artificial Intelligence, Networking, and Parallel/Distributed Computing Vidhate DA, Kulkarni P (2016) Implementation of multiagent learning algorithms for improved decision making. Int J Comput Trends Technol 35(2):60–66 Young-Cheol Choi, Student Member, Hyo-Sung Ahn (2010) ‘‘A Survey on Multi-Agent Reinforcement Learning: Coordination Problems’’, IEEE/ASME International Conference on Mechatronics and Embedded Systems and Applications, pp 81–86 Vidhate DA, Kulkarni P (2016) A step toward decision making in diagnostic applications using single agent learning algorithms. Int J Comput Sci Inform Technol 7(3):1337–1342 M Camara, O Bonham-Carter, J Jumadinova (2015) ‘‘A Multiagent System with Reinforcement Learning Agents for Biomedical Text Mining’’, Proc. of the Sixth ACM Conf. on Bioinformatics, Computational Biology, and Health Informatics, BCB’15, USA, ACM, pp. 634–643 Vidhate DA, Kulkarni P (2014) Multilevel relationship algorithm for association rule mining used for cooperative learning. Int J Comput Appl 86(4):20–27 123