Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Fuzzy Sets and Systems 153 (2005) 371 – 401 www.elsevier.com/locate/fss Genetic learning of fuzzy cognitive maps Wojciech Stach, Lukasz Kurgan∗ , Witold Pedrycz, Marek Reformat Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2V4 Received 27 August 2004; received in revised form 14 December 2004; accepted 18 January 2005 Available online 3 March 2005 Abstract Fuzzy cognitive maps (FCMs) are a very convenient, simple, and powerful tool for simulation and analysis of dynamic systems. They were originally developed in 1980 by Kosko, and since then successfully applied to numerous domains, such as engineering, medicine, control, and political affairs. Their popularity stems from simplicity and transparency of the underlying model. At the same time FCMs are hindered by necessity of involving domain experts to develop the model. Since human experts are subjective and can handle only relatively simple networks (maps), there is an urgent need to develop methods for automated generation of FCM models. This study proposes a novel learning method that is able to generate FCM models from input historical data, and without human intervention. The proposed method is based on genetic algorithms, and requires only a single state vector sequence as an input. The paper proposes and experimentally compares several different design alternatives of genetic optimization and thoroughly tests and discusses the best design. Extensive benchmarking tests, which involve 200 FCMs with varying size and density of connections, performed on both synthetic and real-life data quantifies the performance of the development method and emphasizes its suitability. © 2005 Elsevier B.V. All rights reserved. Keywords: Fuzzy cognitive maps; Dynamic system modelling; Genetic algorithms; Decision analysis 1. Introduction Fuzzy cognitive maps (FCMs) are a soft computing methodology introduced by Kosko in 1986 [16] as an extension of cognitive maps. They are used for modeling of dynamic systems [45,33]. FCM ∗ Corresponding author. Tel.: +1 780 492 5488; fax: +1 780 492 1811. E-mail addresses: wstach@ece.ualberta.ca (W. Stach), lkurgan@ece.ualberta.ca (L. Kurgan), pedrycz@ece.ualberta.ca (W. Pedrycz), reform@ece.ualberta.ca (M. Reformat). 0165-0114/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2005.01.009 372 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 represents a given system as a collection of concepts and mutual relations among them, and are usually classified as neuro-fuzzy systems, which are capable to incorporate and adapt human knowledge [31]. Modelling dynamic systems using FCMs exhibits several advantages. Most importantly they are very simple and intuitive to understand, both in terms of the underlying formal model, and its execution. They are also characterized by flexibility of system design and control, comprehensible structure and operation, adaptability to a given domain, and capability of abstract representation and fuzzy reasoning [22]. The FCM models were developed and used in numerous areas of applications such as electrical engineering, medicine, political science, international relations, military science, history, supervisory systems, etc. Examples of specific applications include medical diagnosis [9], analysis of electrical circuits [39], analysis of failure modes effects [32], fault management in distributed network environment [27], modeling and analysis of business performance indicators [13], modeling of supervisors [40], modeling of software development project [37,38], modeling of plant control [11], modeling of political affairs in South Africa [15] and modeling of virtual worlds [7]. The diversity and number of applications clearly show popularity of this modelling technique, justifying further research to enhance it. However, FCM development methods are far from being complete and well-defined, mainly because of the deficiencies that are present in the underlying theoretical framework [21]. According to the literature, the development of FCM models almost always relies on human knowledge [3]. As a consequence, the developed models strongly depend on subjective beliefs of expert(s) from a given domain. A few algorithms for automated or semi-automated learning of FCMs have been proposed, but none of them provides formalized approach that is suitable for convergence [30]. Some of the proposed learning methods also suffer from being applicable only to FCMs with binary states, require multiple input datasets, which might be difficult to obtain, and finally require human intervention during the learning process. To this end, the aims of this paper are to: • introduce a new learning method, which allows for the development of FCM model with continuous states directly from experimental dataset, and without human intervention. Such method provides a fully automated solution for a problem of learning general class of FCMs. • compare several designs of the learning methods based on genetic algorithms and based on extensive set of experiments select the most effective design environment. • carry out well-organized, thorough tests, considering large number of diverse types of FCMs, and come up with firm design guidelines. The remainder of this paper is organized as follows. Section 2 presents theoretical background concerning the model and learning methods, which includes standard development methods for FCM models, and a brief history of state-of-the-art algorithms for learning FCMs (Section 2.2). Section 3 introduces and provides background of the proposed learning approach, while Section 4 presents comprehensive experimental evaluation and discussion of the achieved results. Finally, Section 5 covers conclusions and future research directions. 2. Fuzzy cognitive maps (FCMs) This section presents a historical overview of FCMs along with detailed background information concerning both the underlying model and the ensuing learning methods. First, the theoretical model is discussed, and specific examples of FCMs reported in the scientific literature are reported. Next, the W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 373 related work is presented. Standard, expert work based, and learning methods for developing FCM models are discussed and compared with the proposed learning method. 2.1. History and background Political scientist Robert Axelrod originally proposed cognitive maps in 1976 [4]. They were used as a tool for representing social scientific knowledge. The cognitive maps model is represented by a simple graph, which consists of nodes and edges. The nodes represent concepts relevant to a given domain and the causal relationships between them are depicted by directed edges. Each edge is associated with a positive or negative sign that expresses a specific type of relationship. A positive edge from node A to node B indicates positive influence on B exerted by A. That means that increase in the value of node A will lead to increase in the value of node B, and vice-versa. A negative edge from node A to node B reflects negative type of relationship. It describes the situation, when increasing value of A leads to decreasing value of B. In many cases, however, this approach turned out to be insufficient because of limited ability to represent causality, which typically is not of a plain two-valued (Boolean) character, (i.e. captured by connections set up as −1 and +1), but rather continuous, i.e. expressed by a set of positive and negative numeric values. The generic maps were significantly enhanced by Kosko, who introduced FCMs. The most significant improvement lies in the way of reflecting causal relationships. Instead of using only the sign, each edge is associated with a number (weight) that determines the degree of considered causal relation. This, in turn, allows implementing knowledge concerning the strength of relationship, which now can be described by a fuzzy term, such as weak, medium, strong, or very strong. In other words, a weight of directed edge from the node A to B quantifies how much concept A causes B [18]. The strength of relationship between two nodes (i.e. weight value) is usually normalized to the [−1, 1] interval. Value of −1 represents full negative, +1 full positive, and 0 denotes no causal effect. As a result, a FCM model is fully described by set of nodes (concepts) and edges (cause-effect relationships), represented by weights, between them. Apart from the graph representation, for computational purposes, model can be equivalently defined by a square matrix, called connection matrix, which stores all weight values for edges between corresponding concepts represented by rows and columns. System with n nodes can be represented by n × n connection matrix. An example FCM model and its connection matrix are shown in Fig. 1. FCM model was applied to many different areas to express dynamic behavior of a set of related concepts. A brief summary of several reported in literature models and their basic characteristics are shown in Table 1. The characteristics include number of nodes in the reported model, density that expresses the ratio of non-zero weights to the total number of weights, and weight resolution (precision) that concerns a resolution of the connections of the map. Precision refers to the minimal value that two weights might differ from each other. The smaller the precision, the more accurately a given system can be described by means of the FCM model. On the other hand, low values of precision make the process of finding weights more difficult and subjective. As shown in Table 1, it becomes quite apparent that FCMs are usually relatively small, and typically consist of 5–10 nodes. Small size is a result of the manual development of such maps where we usually rely on expert knowledge. We note that mutual relationships among large number of concepts are hard to comprehend, analyze, and describe, which results in substantial difficulties in the construction of the corresponding maps. We also note that most of the FCMs are characterized by a relatively low, about 20–30% density (where by density we mean a percentage of the existing connections versus all possible 374 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 N1 NUMBER OF PEOPLE IN A CITY N2 +0.5 +0.6 MIGRATION INTO CITY N1 N2 N3 N4 N5 N6 N7 N1 0 0 0.6 0.9 0 0 0 N2 0.5 0 0 0 0 0 0 N3 0 0.6 0 0 0.8 0 0 N4 0 0 0 0 0 0 0.9 N5 0 0 0 0 0 -0.8 -0.9 N6 -0.3 0 0 0 0 0 0 N7 0 0 0 0 0 0.8 0 +0.6 N3 MODERNIZATION +0.8 N5 +0.9 -0.3 N6 N4 SANITATION FACILITIES -0.8 NUMBER OF DISEASES AMOUNT OF GARBAGE +0.9 -0.9 +0.8 BACTERIA PER AREA N7 Fig. 1. FCM model for public city health issues and the corresponding connection matrix [23]. Table 1 Example applications of FCMs Reference Application area # Nodes Density (%) Weight precision [23] [42] [42] [40] [20,21] [25] [25] [17] [25] [1,3] [45] [5] [36] [43] [41] Public health issues Heat exchanger model Supervisor model for heat exchanger performance Plant supervisor model Industrial control Crime and punishment model EMU taxes and transfers model Virtual squad of soldiers EMU and the risk of war model Simple model of a country E-business company Strategy formation model Evidence of multiple suspicious events Process control model System for direct control of a process 7 5 5 9 5 7 6 10 8 5 7 6 6 5 8 24 50 75 25 60 36 27 34 30 35 40 33 20 40 23 0.1 0.025 0.001 0.01 0.1 1 1 0.01 0.5 0.2 0.1 0.01 0.2 0.01 0.01 connections for the given number of nodes). As a matter of fact, most maps reported in the literature are sparsely connected. It is instructive to recall a formal definition of a FCM along with all necessary notation. Let R be the set of real numbers, while N denotes the set of natural numbers, K = [−1, 1] and L = [0, 1]. A fuzzy cognitive map F is a 4-tuple (N, E, C, f) where 1. N = {N1 , N2 , . . . , Nn } is the set of n concepts forming the nodes of a graph. 2. E : (Ni , Nj ) → eij is a function of N × N to K associating eij to a pair of concepts (Ni , Nj ), with eij denoting a weight of directed edge from Ni to Nj , if i = j and eij equal to zero if i = j . Thus E(N × N) = (eij ) ∈ K n×n is a connection matrix. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 375 3. C : Ni → Ci is a function that at each concept Ni associates the sequence of its activation degrees such as for t ∈ N, Ci (t) ∈ L given its activation degree at the moment t. C(0) ∈ Ln indicates the initial vector and specifies initial values of all concept nodes and C(t) ∈ Ln is a state vector at certain iteration t. 4. f : R → L is a transformation function, which includes recurring relationship on t  0 between C(t + 1) and C(t)   n   ej i Cj (t) . (1) ∀i ∈ {1, . . . , n}, Ci (t + 1) = f  i=1 j =i Eq. (1) describes a functional model of FCM, which is used to perform simulations of the system dynamics. Simulation consists of computing state of the system, which is described by a state vector, over a number of successive iterations. The state vector specifies current values of all concepts (nodes) in a particular iteration. Value of a given node is calculated from the preceding iteration values of nodes, which exert influence on the given node through cause–effect relationship (nodes that are connected to the given node). A number of improvements to the original FCM methodology have been proposed. One of them allows concepts to have non-zero edge values to themselves [43]. This paper exploits the original FCM approach, in which the functional model is defined by (1). The transformation function is used to confine (clip) the weighted sum to a certain range, which is usually set to [0, 1]. The normalization hinders quantitative analysis, but allows for comparisons between nodes, which can be defined as active (value of 1), inactive (value of 0), or active to a certain degree (value between 0 and 1). Three most commonly used transformation functions are shown below. • bivalent f (x) =  0, x  0, 1, x > 0, (2) • trivalent   −1, x  − 0.5, f (x) = 0, −0.5 < x < 0.5,  1, x  0.5, (3) • logistic f (x) = 1 , 1 + e−Cx (4) where C is a parameter used to determine proper shape of the function. We can envision several simulation scenarios, which are dependent on transformation function [15]. Applying discrete-output transformation function (e.g. bivalent or trivalent function), the simulation heads to either a fixed state vector value, which is called hidden pattern or fixed-point attractor, or keeps cycling between a number of fixed state vector values, which is known as a limit cycle. Using a continuous-output transformation function (e.g. logistic signal function), the fixed-point attractor and limit cycle, as well 376 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 N1 N2 N3 N4 N5 N6 N7 1.2 value of node 1 0.8 0.6 0.4 0.2 0 0 5 10 state number 15 20 Fig. 2. Example input data. as so called chaotic attractor can appear. The chaotic attractor is when the FCM continues to produce different state vector values for successive cycles. Fig. 2 shows an example of the fixed-point attractor simulation. In a nutshell, simulation of a FCM results in a sequence of state vectors, which specify state of the modeled system in the successive iterations. The simulation results allow observation and analysis of each concept value, which represents the degree of its existence, over the time. Different scenarios can be considered by simulating the FCM with different initial conditions, which are represented by an initial state vector. 2.2. Related work In general, two approaches to development of FCMs are used: manual and computational. Most, if not all, of the reported models were developed manually by domain expert(s) based on expert knowledge in the area of application. The experts design and implement adequate model manually based on their mental understanding of the modeled domain. Three main steps constitute this process [15]: 1. Identification of key domain issues or concepts. 2. Identification of causal relationships among these concepts. 3. Estimation of causal relationships strengths. First two steps, which result in establishing of an initial draft of FCM model, include identification of concept nodes and relationships among them that are represented by edges. This is performed manually using pencil and paper by taking advantage of FCMs graph representation. However, the main difficulty is to accurately establish weights (strength) of the defined relationships. In order to achieve this, a following procedure might be used [15,42]: 1. The influence of a concept on another between each pair of concepts is determined as “negative”, “positive” or “none”. 2. All relationships are expressed in fuzzy terms, e.g. weak, medium, strong and very strong. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 377 3. The established fuzzy expressions are mapped to numerical values, most frequently to the range from 0 to 1; for example weak is mapped to 0.25, medium to 0.5, strong to 0.75, and very strong to 1.0. During establishing the fuzzy numerical value, analytical procedures, such as Analytical Hierarchy Process proposed by Saaty [34], may be applied [35]. Fuzzy cognitive maps model can be developed by a single expert or a group of experts [15]. Using a group of experts has the benefit of improving reliability of the final model. The FCM model allows for relatively simple aggregation of knowledge coming from multiple experts. Usually each expert develops his or her own FCM model, and the models are later combined together. Several procedures for combining multiple FCM models into single one exist [17]. In general, the manual procedures for developing FCM have a number of drawbacks. They require an expert, who has knowledge of the modeled domain, and at the same time knowledge about the FCMs formalism. Since even medium size models involve a large number of parameters, i.e. weights, very often it is difficult to obtain satisfactory performance. The development process may require many iterations and simulations before a suitable model is developed. In case of group development, additional parameters, such as credibility coefficients of each individual expert, need also to be estimated, which adds to the complexity of the overall process. Manual methods for development of FCM models have also a major disadvantage of relying on human knowledge, which implies subjectivity of the developed model and problems with unbiased assessing of its accuracy. Questions, such as “ I believe, that this relationship is stronger than 0.5. Why did you choose this value to express it?” often cannot be answered in a justifiable way. Also, in case of large and complex domains, the resulting FCM model requires large amount of concepts and connections that need to be established, which substantially adds to the difficulty of manual development process [46]. These problems led to the development of computational methods for learning FCM connection matrix, i.e. casual relationships (edges), and their strength (weights) based on historical data. In this way, the expert knowledge is substituted by a set of historical data and a computational procedure that is able to automatically compute the connection matrix. A number of algorithms for learning FCM model structure have been recently proposed. In general two main learning paradigms are used, i.e. Hebbian learning, and genetic algorithms, but so far none of proposed methods can be adopted as a formal methodology that is suitable for FCMs convergence [30]. In one of the first attempts, Kosko proposed simple Differential Hebbian Learning law (DHL) to be applied to learning of FCMs [8]. This law correlates changes of causal concepts: ėij = −eij + Ċi Ċj , (5) where ėij is the change of weight between concept ith and jth eij is the current value of this weight, and Ċi Ċj are changes in concepts ith and jth values, respectively. The learning process gradually updates values of weights of all edges that exist in the FCM graph until the desired connection matrix is found. In general, the weights of outgoing edges for a given concept node are modified when the corresponding concept value changes. The weights are updated according to the following formula: eij (t + 1) = eij (t) + ct [Ci Cj − eij (t)] if Ci = 0, eij (t) if Ci = 0, (6) 378 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 where eij denotes the weight of the edge between concepts Ci and Cj , Ci represents the change in the Ci concept’s activation value, t is the iteration number, and ct is a learning decay coefficient. The DHL was proposed in 1994, but there were no applications that used this approach to learning FCMs. In 2002, Vazquez presents an extension to DHL algorithm by introducing new formulas to update edge values [46]. The new algorithm, called Balanced Differential Algorithm (BDA) is based on weight update formula, for which updated value depends on values of all concepts that are acting at the same time as a cause of change for the concept. This method was applied only to FCMs with binary concept values, which significantly restricts its application areas. In 2003, Papageorgiou et al. developed another extension to Hebbian algorithm, called Nonlinear Hebbian Learning (NHL), to learn connection matrix of FCMs [29]. The main idea behind this method is to update weights associated only with edges that are initially suggested by expert(s). The NHL algorithm requires human intervention before the learning process starts, which is a substantial disadvantage. Another method of design of FCMs based on Hebbian algorithm was introduced in [30]. Active Hebbian Learning (AHL) introduces the determination of the sequence of activation concepts and improves the accuracy of FCMs. Nevertheless it still requires some initial human intervention. Another main stream in computational methods for learning connection matrix of FCM involves application of genetic algorithms. In 2001, Koulouriotis et al. applied the Genetic Strategy (GS) to compute FCMs cause–effect relationships, i.e. weight values of the FCM model [21]. In this method, the learning process is based on a collection of input/output pairs, which are called examples. The inputs are defined as the initial state vector values, whereas the outputs are final state vector values, i.e. values of state vector after the FCM simulation terminates. Its main drawback is the need for multiple state vector sequences (input/output pairs), which might be difficult to obtain for many of real-life problems. Recently, in 2003, Parsopoulos et al. applied Particle Swarm Optimization (PSO) method, which belongs to the class of Swarm Intelligence algorithms, to learn FCM connection matrix based on a historical data consisting of a sequence of state vectors that leads to a desired fixed-point attractor state [31,28]. The algorithm was applied to find the connection matrix in a search space that is restricted to certain FCM concepts values and imposes constraints on the connection matrix, all of which are specified by domain expert(s). This method was tested only with one small FCM model that involves five concepts. Another recent work involving genetic algorithms was proposed by Khan and Chong, who performed a goal-oriented analysis of FCM in 2003 [14]. Their learning method did not aim to compute the connection matrix, but rather to find initial state vector, which leads a predefined FCM (a map with a fixed connection matrix) to converge to a given fixed-point attractor or limit cycle solution. The method was also tested with only one FCM model. 2.3. Objectives, scope and motivation This study aims to provide a learning method, which avoids disadvantages of the existing methods. It uses a real-coded genetic (RCGA) algorithm to develop FCM connection matrix based on historical data consisting of one sequence of state vectors. In contrast, the approach introduced in [21], requires a set of such sequences. The proposed method is fully automatic, i.e. in contrast to NHL and AHL methods it does not require input from a domain expert. RCGA algorithm learns the connection matrix for a FCM that uses continuous transformation function, which is a more general problem that the one considered in [46]. Finally, the evaluation of the proposed method is performed in a very comprehensive manner. The tests involve ten fold cross-validation experiments for FCMs of varying sizes and densities. The above W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 379 Table 2 Overview of learning approaches applied to FCMs Algorithm DHL BDA NHL AHL GS PSO GA RCGA Ref. [8] [46] [29] [28] [22] [30] [14] This paper Learning goal Connection matrix Connection matrix Connection matrix Connection matrix Connection matrix Connection matrix Initial vector Connection matrix Human intervention No No Yes&Nob Yes&Nob No No N/A No Type of data useda Single Single Single Single Multiple Multiple N/A Single FCM type Learning type Transformation function # nodes N/A Binary Continuous Continuous Continuous Continuous Continuous Continuous N/A 5,7,9 5 8 7 5 11 4,6,8,10 Hebbian Modified Hebbian Modified Hebbian Modified Hebbian Genetic Swarm Genetic Genetic a Single—historical data consisting of one sequence of state vectors, Multiple—historical data consisting of several sequences of state vectors, for different initial conditions. b Initial human intervention is necessary but later when applying the algorithm there is no human intervention needed. methods were tested on several, or sometimes even only one FCMs, and therefore lack comprehensive set of experiments, which would allow appropriate assessment of accuracy and correctness. In contrast, the proposed method was tested on almost 200 different FCMs. To easy comparison between the above methods, and the proposed learning algorithm, a brief summary is shown in Table 2. The table compares the methods based on several factors, such as the learning goal, involvement of a domain expert, input historical data, type of transformation function, and learning strategy type. It also shows, in the “number of nodes” column, for how many and of what size FCM model a given method was tested. All learning methods, except the RCGA, were tested on a single map of the indicated size. For the RCGA method, almost 50 maps for a given size were simulated. Values in bold indicate the main disadvantage of a given learning method. We note that the proposed method is a natural continuation of the research performed in the domain of learning FCM connection matrix. It draws conclusions from the methods proposed in the past, and provides substantial advancement. Next section provides detailed description of the proposed method. 3. Proposed learning method The proposed learning method aims to learn the FCM connection matrix using a genetic algorithm. First, a detailed problem statement is formulated, which is followed by a detailed description of a specific genetic algorithm that was applied. 3.1. Problem statement Based on the formal definition of fuzzy cognitive map presented in Section 2.1, the objective of the FCM learning is to determine the matrix connection Ê given the set of concepts N, sequence of their activation 380 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Table 3 Connection matrix for a FCM model that consists of N concept nodes N1 N2 ... NN−1 NN N1 N2 ... NN−1 NN 0 e21 ... eN−11 eN1 e12 0 ... eN−12 eN2 ... ... ... ... ... e1N−1 e2N−1 ... 0 eNN −1 e1N e2N ... eN−1N 0 degrees, called input data, C(t) at certain iteration interval t ∈ {0, . . . , tK }, and some transformation function f, such that the FCM minimizes the error between given sequence C(t) and sequence Ĉ(t) obtained from running the FCM model with initial condition specified as Ĉ(0) = C(0). The error can be measured in different ways, as will be explained later on. The aim of the proposed learning method is to eliminate human intervention during development of a FCM model. This process is performed by exploiting information from historical data to compute FCM model connection matrix that is able to mimic the data. The input (historical) data comprise of one sequence of state vectors over time. The data length is defined as the number of successive iterations (time points) of the given historical data. The input data is used to compute a FCM model, called candidate FCM, by applying a learning procedure that uses RCGA algorithms. Assuming that edges are allowed only between different concepts (nodes), i.e. concepts do not exhibit cause–effect relationships on themselves, connection matrix of a FCM model can be completely described by N(N − 1) variables, where N is the number of concepts. Thus, the learning of FCM connection matrix boils down to computing N(N − 1) parameters, which are shown in Table 3. The proposed learning algorithm uses input data to find the parameters. Input data is a sequence of states described by state vectors at a particular time (iteration). They illustrate the system’s behavior over time, and are represented by a set of state vectors C(t) at time point t. The input data can be plotted to easy understanding and analysis of dependencies among concepts, see Fig. 2. The example plot corresponds to the fixed-point attractor simulation. The proposed learning method constructs a connection matrix based on input data. The learned model objective is to generate the same state vector sequence for the same initial state vector, as it is defined by the input data. At the same time, the learned model generalizes the inter-relationship between concept nodes, which are inferred from the input data. Therefore, the FCM model is suitable to perform simulation for different initial state vectors, and quantify the degree and type of the cause–effect relationships between the concepts. The learning method uses real-coded genetic algorithm, described next, which allows eliminating expert involvement during development of the model. A high-level overview of the learning process is shown in Fig. 3. 3.2. Proposed genetic algorithms based learning method Genetic algorithms (GAs) are used to perform optimization and search tasks. They have been used in numerous and diverse problem domains [6]. Their origin and principles were inspired by natural genetics [10]. Their advantages are broad applicability, relative ease of use, and global perspective. In this paper we W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 N1 N2 N3 N4 N5 N6 381 +0.5 N7 +0.6 1.2 1 value of node N2 N1 +0.6 N3 0.8 GENETIC ALGORITHM 0.6 0.4 +0.8 +0.9 -0.3 0.2 0 +0.9 0 5 10 state number 15 N5 -0.8 N6 N4 20 Number of nodes + input data -0.9 +0.8 N7 Connection matrix Fig. 3. High-level diagram of the proposed learning method. assume reader’s familiarity with GAs. A useful summary about relevant GAs can be found in [6,10,12]. The proposed learning method uses an extended GA called real-coded genetic (RCGA) algorithm, where a chromosome consists of floating point numbers. RCGA algorithm performs linear transformation for each variable of the solution to decode it to the desired interval. Its main advantages are ability to use with highly dimensional and continuous domains, and richer spectrum of evolution operators that can be applied during the search process [12]. The RCGA algorithm uses the input data to develop and optimize, with respect to the input data, connection matrix of a candidate FCM model. The following sections provide details as to all essential elements of the RCGA environment, including a structure of chromosomes, fitness function, stopping condition, genetic operators, and selection strategy. 3.2.1. Chromosome structure RCGA defines each chromosome as a floating-point vector. Its length corresponds to the number of variables in a given problem. Each element of the vector is called gene. In case of the learning FCMs, each chromosome consists of N(N − 1) genes, which are floating point numbers from the range [−1, 1], defined as follows: Ê = [e12 , e13 , . . . , e1N , e21 , e23 , . . . , e2N , . . . , eN N −1 ]T , where eij specifies the value of a weight for an edge from ith to jth concept node. Each chromosome has to be decoded back into a candidate FCM. This involves copying weight values from the chromosome to the corresponding cell in the connection matrix, which defines connection matrix of the FCM model; for details see Section 3.1. The number of chromosomes in a population is constant for each generation, and it is specified by the population_size parameter. 3.2.2. Fitness function One of the most important considerations for a successful application of GA is design of a fitness function, which is appropriate for a given problem. In case of application to learning connection matrix of FCMs, the design takes advantage of a specific feature of the FCM theory. At each iteration of FCM model simulation state vector C(t + 1) depends only on the state vector at the preceding iteration. This implies that whenever system reaches the state that has been already reached in one of the preceding iteration, its behavior will be exactly the same regardless the simulation history. This results in so called limit cycle and means that even if a given input data length is K, but the limit cycle or fixed point attractor 382 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 occurs at the Lth iteration, L < K, the input data that can be used for learning has to be truncated to the first L iterations. The remaining K − L iterations should not be used since they describe system’s behavior, which is already described by the first L iterations, and therefore would only add unnecessary computational complexity, for details see Section 2. Let us assume that the input data length is K, where all iterations that happen after a limit cycle or fixed pattern attractor occur, are already ignored. By grouping each two adjacent state vectors, K − 1 different pairs can be formed: C(t) → C(t + 1) ∀t = 0, . . . , K − 1. (7) If we define C(t) as an initial vector, and C(t + 1) as system response, K − 1 pairs in the form of {initial vector, system response} can be generated from the input data. The larger is K the more information about the system behavior we have. The fitness function is calculated for each chromosome by computing the difference between system response generated using a candidate FCM and a corresponding system response, which is known directly from the input data. The system response of the candidate FCM is computed by decoding the chromosome into a FCM model and performing one iteration simulation for initial state vector equal to the initial vector. The difference is computed across all K − 1 initial vector/system response pairs, and for the same initial state vector. Three measures of error based on L1 -norm, L2 -norm, and L∞ -norm are shown below: Error_Lp =  N K−1  |Cn (t) − Ĉn (t)|p , (8) t=1 n=1 where C(t) = [C1 (t), C2 (t), . . . , Cn (t)] the known system response for C(t − 1) initial vector, Ĉ(t) = [Ĉ1 (t), Ĉ2 (t), . . . , Ĉn (t)] the system response of the candidate FCM for C(t − 1) initial vector, p = 1, 2, ∞ the norm type,  the parameter used to normalize error rate, which is equal to 1/(K − 1) · N for p ∈ {1, 2}, and 1/K − 1 for p = ∞, respectively. Each of these error measures can be used as the core of fitness function Fitness function = h(Error_Lp ) (9) where h is an auxiliary function. The auxiliary function h was introduced for two main reasons: 1. To ensure that better individuals correspond to greater fitness function values.Argument of this function is the summed error rate, and thus needs to be inversed. 2. To embed non-linearity that aims to reward chromosomes, which are closer to the desired solution. The following function h was proposed: 1 , ax + 1 where parameter a is established experimentally. The fitness function is normalized to the (0, 1] where: h(x) = (10) • the worse is the individual the closer to zero its fitness value is; • fitness value for an ideal chromosome, which results is exactly the same state vector sequence as the input data, is equal to one. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 383 if (Fitness function(best individual) > max_ fitness or t>max_ generation) then stopping_condition = true; where:best individual–the chromosome in the current generation with highest fitness function value, t – current generation number Fig. 4. Definition of the stopping condition. 3.2.3. Stopping condition The proposed stopping condition takes into consideration two possible scenarios of the learning process: • the learning is successful, i.e. the state vector sequence obtained by simulating the candidate FCM is identical or satisfactory close to the input data. The similarity is described by the fitness function value for the best chromosome in each generation. Therefore, the learning should be terminated when the fitness function value reaches a threshold value called max_fitness; • the learning in not successful, but a maximum number of generations, defined by the max_generation parameter, has been reached. The stopping_condition is defined in Fig. 4. Both, the max_fitness and max_generation parameters are established experimentally and given in Section 4.1.1. 3.2.4. Evolutionary operators and selection strategy Recombination is performed using crossover operations. There are many different crossover operators used in RCGA, cf. [12,47]. In our experiments, we consider a simple one-point crossover. It carries a low computational cost yet as demonstrated through experiments, it effectively handles the optimization problem. Also, three different mutation operators were applied: random mutation [24] non-uniform mutation [24], and Mühlenbein’s mutation [26]. Two popular selection strategies, such as roulette wheel selection and tournament selection, were applied [12,10]. 4. Experiments and results 4.1. Experimental setup The goal of the experiments is to assess quality of the proposed method for learning FCMs. First, the experimental results that were focused on selecting best fitness function from those proposed in Section 3.2.2 are performed and reported. The aim of these experiments is to analyze the influence of fitness function on the learning process and select the one that gives the best quality in terms of convergence and accuracy. The tests carried out with the selected best function, are divided into two groups: tests performed with synthetic, and with real-life data. The first group uses synthetic input data, which are generated from randomly generated input FCM models. The latter group uses real-life, i.e. previously published in a scientific literature, input FCMs to generate the input data. Evaluation of the solution 384 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 +0.5 N1 +0.6 N2 +0.6 N3 +0.9 RCGA algorithm +0.8 -0.3 N5 -0.8 N6 N4 +0.9 -0.9 +0.8 N7 input data N1 +0.5 RCGA algorithm +0.8 -0.3 -0.8 N4 N5 +0.9 +0.8 +0.8 -0.3 -0.8 +0.9 -0.9 +0.8 -0.9 N7 N7 input FCM N5 N6 N4 N6 +0.9 N2 +0.6 +0.6 N3 N3 +0.9 +0.5 N1 N2 +0.6 +0.6 output data candidate FCM input data candidate FCM output data Fig. 5. High-level diagram of the experimental setup. quality is performed by testing similarity between the data and by studying generalization capabilities of the candidate FCM. The latter criterion takes advantage of availability of the input FCM model and is performed by running simulation from new, randomly generated, initial state vectors and observing if the two maps exhibit the same behavior. A diagram presenting an overview of the test procedure is shown in the bottom part of Fig. 5. The upper part of Fig. 5 shows a typical application of the proposed learning method, where the RCGA algorithm is used to infer FCM connection matrix from the input data. In general, the goals of learning FCM can be divided into two categories. The first objective is to find FCM connection matrix that generates the same state vector sequence as the input data for a given initial state vector. Since there is a risk that overfitted models would behave correctly only for this particular initial state vector, the second goal is to find candidate FCM that behaves similarly to the input FCM for different initial state vectors. The fulfillment of this goal is guaranteed only in case, in which the model connection matrices are exactly the same. However, since we cannot assume that only FCM with identical connection matrices behave the same, a test that simulates the input and the candidate FCMs for a set of different initial state vectors, and compares their outcomes, need to be performed. To evaluate quality of tests that need to be performed to verify the above goals two different criteria were developed. They aim to evaluate quality of the candidate FCM based on similarity of input data and generalization. Below, the two criteria are defined and explained: 1. Input data error criterion, which measures similarity between the input data, and data generated by simulating the candidate FCM with the same initial state vector as for the input data. The criterion is defined as a normalized average error between corresponding concept values at each iteration between the two state vector sequences: N K−1  1 error_initial = |Cn (t) − Ĉn (t)|, (K − 1) · N t=1 n=1 (11) W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 385 where Cn (t) is the value of a node n at iteration t in the input data, Ĉn (t) the value of a node n at iteration t from simulation of the candidate FCM, K the input data length and N is the number of nodes. 2. Model behavior error criterion, which measures generalization capabilities of the candidate FCM. To compute this criterion, both the input and the candidate FCMs are simulated from P randomly chosen initial state vectors. Next, the error_initial value is computed for each of the simulations to compare state vector sequences generated by the input and the candidate FCM, and an average of these values is computed: N P K−1   1 p p error_behavior = |Cn (t) − Ĉn (t)|, P · (K − 1) · N (12) p=1 t=1 n=1 p where Cn (t) is the value of a node n at iteration t for data generated by input FCM model started from p pth initial state vector, Ĉn (t) the value of a node n at iteration t for data generated by candidate FCM model started from pth initial state vector, K the input data length, N the number of nodes and P is the number of different initial state vectors. Both of these criteria are used for both synthetic and real-life data tests to quantify quality of learning candidate FCM. In order to put the quality of the proposed learning method into a perspective, a set of baseline results were computed. The baseline was generated by computing these criteria values for randomly generated FCMs and comparing them with the input FCM. Ten random FCM were generated for each test category, and the average value of the two criteria was reported as the baseline. The quality of learning depends on the quality of the input data. In case of FCMs, for the learning to be successful, the limit cycle or fixed point attractor must not appear in the early few simulation iterations of the input FCM. The state vector values after the limit cycle follow a cycle, which is already described by proceeding state vector values, while state vector values after the fixed point attractor are constant. Therefore, learning algorithm uses state vector values, but only these which are proceeding the limit cycle or fixed point attractor. As a result, if they appear early in the sequence, the input data may be too short to accurately learn candidate FCM that corresponds to the input FCM. In this case, it is possible to learn candidate FCM that generates state vector sequence identical to the input data, but cannot be considered as a good solution for this problem because lacking of fulfillment the generalization criterion, i.e. error_initial value is small, yet error_behavior is relatively high. To avoid this problem, the minimum number of iterations before limit cycle or fixed point attractor, called min_data_length, was set to 20 for all tests. Also, the max_data_length parameter that specifies the maximum number of iterations before limit cycle or fixed point attractor, which was set to 50, was introduced to reduce simulation time required to perform experiments. In general, bigger length of the input data improves possibility of learning high-quality candidate FCM, but also increases computational time. Several other minor assumptions were also made. For both, the input and candidate FCMs, all weight values smaller than 0.05 were rounded down to 0, since no real-life map considers such weak relationships. Moreover, during simulations all nodes values are rounded to two digits after decimal point, which results from a trade-off between model comprehensibility and accuracy of relationship representation. 4.1.1. RCGA parameters The parameters, which were defined in the Section 3.2, were instantiated with specific values based on simulations and learning performance for experiments performed with different FCM sizes and densities. 386 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 The experiments were performed by generating a random FCM, which was used to generate input data. Next, the RCGA algorithm was used to generate a candidate FCM based on the input data. The goal was to find values, which lead to convergence regardless of the size and density of the FCM in reasonable running time. Totally, 50 experimental learning results were visually inspected to establish the values, which are reported below: • recombination method—single-point crossover; • mutation method—randomly chosen from random mutation, non-uniform mutation, and Mühlenbein’s mutation; • selection method—randomly chosen from roulette wheel and tournament; • probability of recombination: 0.9; • probability of mutation: 0.5; A high value of mutation probability is due a large number of sub-optimal solutions for each FCM model. Low mutation value leads to slow exploration of the search space, and, in consequence, the algorithm may get stuck in sub-optimal solution, which is very good in terms of the error_initial criterion yet carries a substantial error_behavior value. The completed experiments show that high mutation rates ensure better performance of the final model. • population_size: 100 chromosomes; • max_generation: 300,000; • max_fitness: 0.999; • fitness function – based on L1 , L2 , and L∞ norms, see Section 3.2.2 for details; All experiments were performed with logistic transformation function, see Section 2.1, as a generalization of discrete-type function, which results in more flexible representation of node activation degrees. This function is parameterized and given below: f (x) = 1 . 1 + e−5x (13) The parameter value of 5 is commonly used in simulations performed with FCM models [25]. A set of experiments focused on comparing different fitness functions, which is described in Section 4.2.1, was preceded by establishing parameter a for the auxiliary function for all considered fitness functions, see formula (10). This was done by performing 20 simulations for each function and observing convergence to the desired solutions. As a result, for fitness functions based on L1 and L∞ norm a was set to 100, whereas for function based on L2 norm, it was set to 10,000. The final formula for all considered fitness functions is presented below: Fitness_function_Lp = 1 , a · Error_Lp + 1 (14) where p = 1, 2, ∞ is the norm that is applied to fitness function, Error_LP the formula introduced in Section 3.2.2, a = 100 for L1 and L∞ norms; a = 10, 000 for L2 norm. Two examples of FCM learning experiments with fitness function based on L2 norm that use the above parameters are plotted in Figs. 6 and 7. The figures show fitness value of the best chromosome, the average fitness value through the entire population during the learning process, with respect to the iteration number, and the average weights change of best solution between two consecutive iterations every 3000 iterations. Please note that weights change bars are plotted using logarithmic scale on y-axis, which is shown on the right-hand side of each plot. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 387 Fig. 6. FCM with 8 nodes and 40% density. Fig. 7. FCM with 6 nodes and 60% density. The results show that the RCGA algorithm gradually converges into a high-quality candidate FCM. 4.2. Experiments outcome 4.2.1. Experiments to select the best fitness function The goal of these experiments is to choose fitness function for the genetic learning of FCM. The set of tests, similar to experiments with synthetic data described in Section 4.2.2, is carried out with all three functions proposed in Section 3.2.2. In order to achieve reliable results, several different FCM model configurations are used in these experiments, i.e. model sizes of 6 and 8 nodes, and densities of 20%, 40%, 60% and 80%. In order to compare between different fitness functions, for each of them a baseline value was estimated. The baseline was computed by choosing 10 different randomly generated FCM models and calculating the corresponding error coefficients, see formula (8), and taking an average value. 388 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Table 4 Experiment results with different fitness functions FCM parameters # Nodes Density (%) Fitness function L1 Error_L1 Ratio (%) Error_L1 /Baseline_L1 Fitness function L2 Error_L2 Ratio (%) Error_L2 /Baseline_L2 Fitness function L∞ Error_L∞ Ratio (%) Error_L∞ /Baseline_L∞ 6 6 6 6 8 8 8 8 1.30E − 03 2.92E − 03 3.78E − 03 3.39E − 03 5.53E − 03 8.25E − 03 5.60E − 03 5.59E − 03 2.40E − 06 5.00E − 08 2.00E − 07 5.00E − 08 1.38E − 05 1.07E − 05 2.04E − 05 1.62E − 05 1.48E − 02 1.30E − 02 9.83E − 03 9.97E − 03 3.15E − 02 3.29E − 02 2.78E − 02 1.58E − 02 20 40 60 80 20 40 60 80 3.60E − 01 8.06E − 01 1.04E + 00 9.35E − 01 1.53E + 00 2.28E + 00 1.55E + 00 1.54E + 00 1.17E − 03 2.44E − 05 9.76E − 05 2.44E − 05 6.73E − 03 5.22E − 03 9.95E − 03 7.90E − 03 1.77E + 00 1.55E + 00 1.18E + 00 1.19E + 00 3.77E + 00 3.94E + 00 3.32E + 00 1.89E + 00 Each considered FCM, in terms of the number of nodes and density, was simulated 10 times with the three fitness functions, which totalled in 240 experiments. Table 4 reports the results, which include average error rates and the corresponding baselines. The baselines for the corresponding error rates, which have been obtained experimentally, have the following values: Baseline_L1 : 0.362, Baseline_L2 : 0.205, Baseline_L∞ : 0.836. Table 4 reports the ratio of error in each of the metrics with respect to the corresponding baseline, which expresses accuracy of the learned FCM with respect to the randomly generated solution. By analyzing the corresponding ratio values, the results show that the best performance is achieved with fitness function based on L2 norm. In order to further verify this hypothesis, the results for L1 and L∞ based fitness functions were expressed using the L2 norm. This allows for a direct comparison among the three fitness functions, see Table 5. Fig. 8 shows relationship between FCM parameters, i.e. number of nodes and density, and ratio values in L2 for each fitness function. Analyzing the results, the best convergence is consistently achieved with L2 -based fitness function. Error rates for the two other functions are bigger and increase more rapidly with the increasing FCM size. The effectiveness of the L2 -based fitness function comparing to the L1 -based one comes from the fact that the former one is more sensitive on errors, which leads to faster convergence. On the other hand L∞ based norm gives reasonably good results for small FCMs, yet is ineffective when the search space increases. We note that these experiments show that proper selection of fitness function is of great importance for genetic algorithms. Considering the above conclusions, the remaining experiments, i.e. for both synthetic and real-life data, are carried out with the best fitness function, which is based on L2 norm. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 389 Table 5 Comparison of learning quality for different fitness functions FCM parameters # Nodes Density (%) Fitness function L1 Error_L2 Ratio (%) Error_L2 / Baseline_L2 Fitness function L2 Error_L2 Ratio (%) Error_L2 / Baseline_L2 Fitness function L∞ Error_L2 Ratio (%) Error_L2 / Baseline_L2 6 6 6 6 8 8 8 8 3.30E − 04 9.88E − 03 2.13E − 02 8.18E − 03 3.16E − 02 2.63E − 02 1.18E − 02 2.54E − 03 2.40E − 06 5.00E − 08 2.00E − 07 5.00E − 08 1.38E − 05 1.07E − 05 2.04E − 05 1.62E − 05 4.87E − 03 2.49E − 03 4.98E − 04 1.66E − 03 7.78E − 02 6.77E − 02 1.95E − 02 2.22E − 03 20 40 60 80 20 40 60 80 1.61E − 01 4.82E + 00 1.04E + 01 3.99E + 00 1.54E + 01 1.28E + 01 5.74E + 00 1.24E + 00 1.17E − 03 2.44E − 05 9.76E − 05 2.44E − 05 6.73E − 03 5.22E − 03 9.95E − 03 7.90E − 03 2.38E + 00 1.21E + 00 2.43E − 01 8.09E − 01 3.80E + 01 3.30E + 01 9.50E + 00 1.08E + 00 3.50E+01 L1 3.00E+01 L2 2.50E+01 L00 2.00E+01 1.50E+01 1.00E+01 5.00E+00 20 40 60 80 0.00E+00 6 6 6 number 8 8 of nodes 8 de n [% sit ] y Error_L2/Baseline_L2 ratio 4.00E+01 Fig. 8. Influence of different fitness functions on learning quality. 4.2.2. Synthetic data These tests use randomly generated input FCM models to generate synthetic input data, which is used to learn candidate FCM. A wide range of input FCM models was considered to accommodate different types of real-life FCM models reported in Table 1. As a result, the tests were performed with FCMs with 4, 6, 8, and 10 nodes, and with densities of 20%, 40%, 60% and 80%, which results in 16 test configurations. Each test was carried out according to the following routine: 1. Set the parameters: number of nodes N, density D, number of experiments T, number of tests P, current experiment t = 1 2. Generate a random FCM model with N nodes and D density 3. Generate random initial state vector 390 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 4. 5. 6. 7. 8. Perform simulation for the generated initial state vector to generate input data IF the input data length is NOT between min_data_length and max_data_length THEN go to 2 Apply RCGA algorithm to learn candidate FCM model Compute parameter error_initial for the candidate FCM model, Compute error_behavior parameter by performing P simulations for input and candidate FCMs with new, randomly generated initial state vectors 9. IF t < T THEN increase t by one AND go to 2 10. Report average values and standard deviations for error_initial and error_behavior Parameter T, which specifies number of experiments carried for a given configuration, and P, which specifies number of tests with a different initial state vector in each case, were set to 10. Thus, the average values and standard deviations for the 10 experiments are reported. A total of almost 200 tests were performed. For each FCM size and density 10 tests were performed, and average results are reported. Additionally, for each of the 10 tests, 10 simulations to compute the error_behavior value, which bring the number of simulations to almost 2000, were performed, and the average values are reported. Following, selected results achieved by the RCGA algorithms, in terms of comparison of state vector sequences generated by candidate and input FCMs for several different initial state sequences, and error defined by the difference between the input data and the generated sequence, are presented. Figs. 9 and 10 show data, which were used to compute values of the error_initial and error_behavior criteria. They present comparison, in terms of plots and errors, between the input data and state vector sequence generated by the candidate FCM for the input state vector defined in the input data; see Figs. 9a, b, e, f and 10a, b, e, f. They also present comparison between the best and the worst state vector sequences, in terms of similarity to sequences generated by the input FCM, generated by the candidate FCM, and state vector sequences generated by the input FCM for experiments performed using 10 new, randomly generated input state vectors; see Figs. 9c, d, g, h and 10c, d, g, h. The former results were used to compute the error_initial value, while the latter were used to compute error_behavior value. Figs. 9 and 10 present results of two representative experiments, one for 8 nodes and 40% density, and another for 6 nodes and 60% density. For each experiment, the best and the worst, in terms of computing the error_initial value, experiments were selected and presented using four consecutive figures. The figures show: 1. the input data and the result of simulation of candidate FCM started from initial condition defined in the input data (Figs. 9a,e and 10a,e), 2. error rate expressed as an absolute difference between corresponding state vector values between the two sequences described above (Figs. 9b,f and 10b,f), 3. results of the best among ten experiments performed with a new initial state vector for both input and candidate FCM model (Figs. 9c,g and 10c,g), 4. results of the worst among ten experiments performed with a new initial state vector for both input and candidate FCM model (Figs. 9d,h and 10d,h). The following rules characterize the figures: • Plots shown in black depict simulations of input FCM, while grey plots concern simulations performed with the candidate FCM. Ideally, they should overlap, and in such a case only grey lines are visible. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 391 Fig. 9. Experiments for nodes = 8 and density = 40%: (a) Input data and plot obtained from simulation of candidate FCM; (b) Difference between corresponding states vector values for plots from (a); (c) Comparison of the best simulation of candidate and input FCMs with new state vector; (d) Comparison of the worst simulation of candidate and input FCMs with new state vector; (e) Input data and plot obtained from simulation of candidate FCM; (f) Difference between corresponding states vector values for plots from (e); (g) Comparison of the best simulation of candidate and input FCMs with new state vector; (h) Comparison of the worst simulation of candidate and input FCMs with new state vector. 392 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Fig. 10. Experiments for nodes = 6 and density = 60%: (a) Input data and plot obtained from simulation of candidate FCM; (b) Difference between corresponding states vector values for plots from (a); (c) Comparison of the best simulation of candidate and input FCMs with new state vector; (d) Comparison of the worst simulation of candidate and input FCMs with new state vector; (e) Input data and plot obtained from simulation of candidate FCM; (f) Difference between corresponding states vector values for plots from (e); (g) Comparison of the best simulation of candidate and input FCMs with new state vector; (h) Comparison of the worst simulation of candidate and input FCMs with new state vector. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 393 Table 6 Experimental results for the synthetic data Nodes 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 Density (%) 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 Error_initial ± stdev 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.005 ± 0.005 0.005 ± 0.006 0.004 ± 0.004 0.003 ± 0.003 0.057 ± 0.043 0.015 ± 0.021 0.014 ± 0.020 0.006 ± 0.008 0.088 ± 0.095 0.037 ± 0.048 0.026 ± 0.039 0.006 ± 0.009 Error_behavior ± stdev 0.001 ± 0.001 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.017 ± 0.030 0.018 ± 0.027 0.022 ± 0.029 0.011 ± 0.016 0.102 ± 0.086 0.052 ± 0.063 0.056 ± 0.060 0.036 ± 0.060 0.168 ± 0.147 0.094 ± 0.102 0.085 ± 0.125 0.089 ± 0.138 Nodes—number of nodes of the input FCM; density (%)—ratio of non-zero weights to the total number of weights; error_initial, error_behavior—average value of corresponding evaluation criteria; stdev—standard deviation of the corresponding reported criterion. • The bar plots show the error, in terms of an absolute difference, between corresponding state vector values generated by the input and the candidate FCMs. The following conclusions can be drawn based on analysis of the above figures: • The error_initial values were relatively low in all considered cases, which confirms the overall results. In one case, the candidate FCM was able to perfectly recover the input data, which resulted from its identical connection matrix when compared to the connection matrix of the input FCM, see Fig. 9a and b. • In general, better results, in terms of recovering the input data, were obtained for cases, in which the sequence in the input data heads for the fixed-point attractor, see Figs. 9a and 10a. In such cases, the errors decrease with the simulation time and stabilize at certain, low level. On the other hand, models that head for limit cycle turned out to have higher error rate, see Figs. 9e and 10e. • Simulation of input and candidate FCMs for experiments with new initial state vectors again show that fixed-point attractor sequences give better learning results when compared to limit cycle sequences. We observed that in general FCM models have an inclination to head for the same fixed-point attractor vector value regardless of the value of the initial state vector value, which was also reported in [37]. Therefore the results for the fixed-point attractor sequences perform on average better; compare Figs. 9d and h, and 10d and h. Table 6 shows summary of the results for all performed experiments, considering different number of concept nodes and densities. To easy the analysis of the results, a relation between FCM parameters, i.e. size and density, and each of the evaluation criteria was presented in Figs. 11 and 12. Each of these figures includes a table with the 394 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Fig. 11. Error_initial values as a function of number of nodes and FCM density. Fig. 12. Error_behavior values as a function of number of nodes and FCM density. average values of the corresponding criterion across different sizes and densities of the input FCMs. The content of the tables is also represented as graphs, which additionally include the baseline results. The error_initial values, which describe the error between the input data and state vector sequence generated by the candidate FCM, are relatively small for all considered experiments. We note that they slightly increase with the increasing size and decreasing density of the input FCM, but even for the 10 nodes and 30% dense FCM the value indicates finding high-quality candidate FCM. Since this criterion was used to develop fitness function of the RCGA algorithm, low values indicate that the learning method is effective in finding a good quality solution. In general, this result indicates that the proposed learning method is able to find FCM that can closely mimic the input data for the same initial state vector. The second criterion, i.e. error_behavior, shows how the RCGA algorithm copes with preventing overfitting the initial data, or in another words with finding a well-generalized solution, i.e. it finds FCM that fits the input data well, but also behaves similarly to the input FCM for different input state vectors. High values of this criterion would indicate poor quality of the candidate FCM. Compared to the values of the error_initial criterion, the rate of deterioration of error_behavior values is also linear, but it is W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 395 more rapid. We note that error_behavior values increase with the increasing size of the input FCM, and are approximately constant for different densities. For some experiments, slightly better results were obtained for denser FCM models. This indicates that the quality of the learning method deteriorates with the increasing size of the maps, but is almost independent on the density. Comparison of the achieved results with respect to the baseline values reveals that the proposed learning algorithm generates candidate FCMs with good quality. We observe that the RCGA algorithm generates very high quality candidate FCM for problems involving 4 and 6 concept nodes, while the quality for larger problems is still acceptable. In general, we conclude that the experimental results proved usefulness of the genetic algorithm based learning for this problem. 4.2.3. Real-life data These tests were performed with two FCM models that have been reported in literature. Larger models, with 7 and 10 concept nodes, were selected to show quality of the proposed learning method for harder domains. In this case the FCM was predefined by the original author (domain expert). The experiments involved simulating the input FCM to generate input data, and later using the input data to generate a candidate FCM. Next, the error_initial was computed for the candidate FCM, and similarly as for the synthetic data, 10 experiments with new, randomly generated state vectors were performed to compute the error_behavior value. Two experiments with real-life FCM were performed. First experiment involves a 7 nodes map, while the other a 10 nodes map. 4.2.3.1. Experiment 1—e-business company: First experiment was performed with FCM model proposed by Tsadiras, which concerns business industry and financial activities [44]. This FCM describes relationships among seven concepts, which were identified as important in the strategic planning process of a e-business company. The following concepts were considered: e-business profits, e-business sales, prices cutoffs, customer satisfaction, staff recruitments, impact from international e-business competition, and better e-commerce services. The density of the considered FCM was 40%. The input data for learning, shown in Fig. 13, was generated with a randomly chosen initial state vector, and reaches the fixed-point attractor state after 10 iterations. The input data was applied to the proposed RCGA learning algorithm, resulting in a learning progress shown in Fig. 14. The learning results were evaluated identically as in case of the experiments with the synthetic data. The error_initial (±stdev) value was 0.004 (±0.005), while error_behavior(±stdev) value was 0.01 (±0.02). Following, selected results achieved by the RCGA algorithm, in terms of comparison of state vector sequences and error defined by the difference between the input data and the generated sequence, for the e-business FCM, are presented; see Fig. 15. They follow the same rules, as defined for the experiments with the synthetic data. The following conclusions can be drawn based on analysis of the results of this experiment: • The quality of the candidate FCM is very good in terms of both criteria, despite that the input data length was shorter than assumed for synthetic data experiments. • Results from simulation with new initial state vector show that for certain cases the results obtained from the candidate FCM simulation are almost identical with those from original FCM. For others, some differences are present in several initial states, but the system stabilizes in the same state. 396 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 1 value of node 0.8 0.6 0.4 0.2 0 0 2 1 3 6 5 4 7 8 9 10 state number Fig. 13. Input data for e-business company FCM. Average fitness Best fitness 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0E+00 5.0E+04 1.0E+05 1.5E+05 2.0E+05 2.5E+05 3.0E+05 3.5E+05 Fig. 14. Learning process for the e-business company FCM. 4.2.3.2. Experiment 2—squad of soldiers in combat: Second experiment was performed with a FCM model proposed by Kosko, which describes squad of soldiers in combat, for details see [19]. The maps included 10 nodes, which describe behavior of soldiers: cluster, proximity of enemy, receiving fire, presence of authority, firing weapons, peer visibility, spread out, taking cover, advancing, and fatigue. The density of the considered FCM was 34%. The input data for learning, shown in Fig. 16, was generated with a randomly chosen initial state vector, and reaches the fixed-point attractor state after 12 iterations. The input data was applied to the proposed RCGA learning algorithm, resulting in a learning progress shown in Fig. 17. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 397 Fig. 15. Selected results achieved for the e-business FCM: (a) Input data and plot obtained from simulation of candidate FCM; (b) Difference between corresponding states vector values for plots from (a); (c) Comparison of the best simulation of candidate and input FCMs with new state vector; (d) Comparison of the worst simulation of candidate and input FCMs with new state vector. Node 1 Node 6 Node 2 Node 7 Node 3 Node 8 Node 4 Node 9 Node 5 Node 10 1.2 value of node 1 0.8 0.6 0.4 0.2 0 0 2 4 6 state number 8 Fig. 16. Input data for squad of soldiers FCM. 10 12 398 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 Average fitness Best fitness 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0E+00 5.0E+04 1.0E+05 1.5E+05 2.0E+05 2.5E+05 3.0E+05 3.5E+05 Fig. 17. Learning process for the squad of soldiers FCM. Fig. 18. Selected results achieved for the squad of soldiers in combat FCM: (a) Input data and plot obtained from simulation of candidate FCM for squad of soldiers FCM; (b) Difference between corresponding states vector values for plots from (a); (c) Comparison of the best simulation of candidate and input FCMs with new state vector for squad of soldiers FCM; (d) Comparison of the worst simulation of candidate and input FCMs with new state vector for squad of soldiers FCM. For the experiments with the squad of soldiers in combat FCM, the error_initial(±stdev) value was 0.02 (±0.03), while error_behavior (±stdev) value was 0.04 (±0.09). Other selected results, similarly as in case of experiment 1, are presented in Fig. 18. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 399 On the basis of the experiment with squad of soldiers FCM the following conclusions can be drawn: • Quality of results is good, but slightly worse than those for the e-business FCM. The reason is greater number of nodes, which implies more complex models, and agrees with results for synthetic data. • Generalization of model expressed by examining similarity of plots obtained from new initial vectors is satisfactory; visible differences occur only in few initial states, but the system stabilizes in the same state; Experiments performed with real-life data were focused on examining the proposed learning approach when dealing with real systems reported in literature. The results obtained from simulations show high usefulness of this learning method. The quality of candidate FCM, which is expressed by two error values, is comparable with the corresponding results obtained from synthetic data. 5. Conclusions and further directions In this study, we have developed a comprehensive learning environment for the development of fuzzy cognitive maps. It has been shown how genetic optimization helps construct maps on a basis of experimental numeric data. We demonstrated the feasibility and effectiveness of the evolutionary approach. The paper discusses relevant work, and proposes and tests a novel learning strategy, based on a realcoded genetic algorithm. The method is able to generate a FCM model from input data consisting of a single sequence of state vector values. First set of experiments aimed to design high-quality geneticalgorithm-based learning strategy based on different fitness functions. In order to achieve this goal, three different functions were examined, and the best among them was selected. Later, a comprehensive set of tests was performed with the selected function, including experiments with both synthetic and real-life data, and different sizes and densities of FCMs. The results show that the proposed learning method is very effective, and generates FCM models that can almost perfectly represent the input data. We note that the quality of learning deteriorates with the increasing size of the maps. In general, the proposed method achieved excellent quality for maps up to 6 nodes, while for maps up to 10 nodes that quality is still satisfactory. Since many different configurations of FCMs have been tested, the produced results could provide some guidelines for other learning methods. The future work will concern on the improvement of the proposed learning method, especially in terms of its scalability (computational complexity) and convergence. One of interesting and open issues worth pursuing would be to associate parameters of the genetic algorithm with the characteristics of a given experimental data. Another interesting direction concerns the use of the learning method in a context of practical applications. Those could involve such areas as, e.g., stock exchange or sports bets. Acknowledgements The authors would like to thank anonymous reviewers for constructive comments. This research was supported in part by the Natural Sciences & Engineering Research Council of Canada (NSERC). 400 W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 References [1] J. Aguilar, Adaptive random fuzzy cognitive maps, Proc. 8th Ibero-American Conf. on AI, 2002, pp. 402–410. [2] J. Aguilar, A dynamic fuzzy-cognitive-map approach based on random neural networks, Internat. J. Comput. Cognition 1 (4) (2003) 91–107. [3] J. Aguilar, A survey about fuzzy cognitive maps papers (Invited Paper), Internat. J. Comput. Cognition 3 (2) (2005) 27–33. [4] R. Axelrod, Structure of Decision: The Cognitive Maps of Political Elites, Princeton University Press, Princeton, NJ, 1976. [5] C. Carlsson, R. Fuller, Adaptive fuzzy cognitive maps for hyperknowledge representation in strategy formation process, Proc. Internat. Panel Conf. on Soft and Intelligent Computing, 1996, pp. 43–50. [6] K. Deb, An introduction to genetic algorithms, Sadhana 24 (4) (1999) 205–230. [7] J.A. Dickerson, B. Kosko, Fuzzy virtual worlds, Artif. Intel. Expert 7 (1994) 25–31. [8] J.A. Dickerson, B. Kosko, Virtual worlds as fuzzy cognitive maps, Presence 3 (2) (1994) 173–189. [9] V.C. Georgopoulos, G.A. Malandraki, C.D. Stylios, A fuzzy cognitive map approach to differential diagnosis of specific language impairment, J. Artif. Intel. Med. 29 (3) (2003) 261–278. [10] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989. [11] K. Gotoh, J. Murakami, T. Yamaguchi, Y. Yamanaka, Application of fuzzy cognitive maps to supporting for plant control, Proc. SICE Joint Symp. 15th Systems Symp. and Tenth Knowledge Engineering Symp., 1989, pp. 99–104. [12] F. Herrera, M. Lozano, J.L. Verdegay, Tackling real-coded genetic algorithms: operators and tools for behavioural analysis, Artif. Intel. Rev. 12 (4) (1998) 265–319. [13] D. Kardaras, G. Mentzas, Using fuzzy cognitive maps to model and analyse business performance assessment, in: J. Chen, A. Mital (Eds.), Advances in Industrial Engineering Applications and Practice II, 1997, pp. 63–68. [14] M.S. Khan, A. Chong, Fuzzy cognitive map analysis with genetic algorithm, Proc. 1st Indian Internat. Conf. on Artificial Intelligence (IICAI-03), 2003 [15] M. Khan, M. Quaddus, Group decision support using fuzzy cognitive maps for causal reasoning, Group Decision Negotiation J. 13 (5) (2004) 463–480. [16] B. Kosko, Fuzzy cognitive maps, Internat. J. Man-Mach. Studies 24 (1986) 65–75. [17] B. Kosko, Hidden patterns in combined and adaptive knowledge networks, Internat. J. Approx. Reason. 2 (1988) 377–393. [18] B. Kosko, Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Cliffs, NJ, 1992. [19] B. Kosko, Fuzzy Engineering, Prentice-Hall, Englewood Cliffs, NJ, 1997. [20] D.E. Koulouriotis, I.E. Diakoulakis, D.M. Emiris, Anamorphosis of fuzzy cognitive maps for operation in ambiguous and multi-stimulus real world environments, 10th IEEE Internat. Conf. on Fuzzy Systems, 2001, pp. 1156–1159. [21] D.E. Koulouriotis, I.E. Diakoulakis, D.M. Emiris, Learning fuzzy cognitive maps using evolution strategies: a novel schema for modeling and simulating high-level behavior, IEEE Congr. on Evolutionary Computation (CEC2001), 2001, pp. 364–371. [22] D.E. Koulouriotis, I.E. Diakoulakis, D.M. Emiris, E.N. Antonidakis, I.A. Kaliakatsos, Efficiently modeling and controlling complex dynamic systems using evolutionary fuzzy cognitive maps (Invited Paper), Internat. J. Comput. Cognition 1 (2) (2003) 41–65. [23] K.C. Lee, W.J. Lee, O.B. Kwon, J.H. Han, P.I. Yu, Strategic planning simulation based on fuzzy cognitive map knowledge and differential game, Simulation 71 (5) (1998) 316–327. [24] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer, Berlin, 1992. [25] S.T. Mohr, The use and interpretation of fuzzy cognitive maps, Master’s Project, Rensselaer Polytechnic Institute, 1997. [26] H. Mühlenbein, D. Schlierkamp-Voosen, Predictive models for the breeder genetic algorithm I. continuous parameter optimization, Evol. Comput. 1 (1) (1993) 25–49. [27] T.D. Ndousse, T. Okuda, Computational intelligence for distributed fault management in networks using fuzzy cognitive maps, Proc. IEEE Internat. Conf. Communications Converging Technologies for Tomorrow’s Application, 1996, pp. 1558–1562. [28] E.I. Papageorgiou, K.E. Parsopoulos, C.D. Stylios, P.P. Groumpos, M.N. Vrahatis, Fuzzy cognitive maps learning using particle swarm optimization, J. Intel. Inform. Systems, in press. [29] E. Papageorgiou, C.D. Stylios, P.P. Groumpos, Fuzzy cognitive map learning based on nonlinear Hebbian rule, Australian Conf. on Artificial Intelligence, 2003, pp. 256–268. W. Stach et al. / Fuzzy Sets and Systems 153 (2005) 371 – 401 401 [30] E. Papageorgiou, C.D. Stylios, P.P. Groumpos, Active Hebbian learning algorithm to train fuzzy cognitive maps, Internat. J. Approx. Reason. 37 (3) (2004) 219–249. [31] K.E. Parsopoulos, E.I. Papageorgiou, P.P. Groumpos, M.N. Vrahatis, A first study of fuzzy cognitive maps learning using particle swarm optimization, Proc. IEEE 2003 Congr. on Evolutionary Computation, 2003, pp. 1440–1447. [32] C.E. Pelaez, J.B. Bowles, Applying fuzzy cognitive maps knowledge representation to failure modes effects analysis, Proc. IEEE Annu. Symp. on Reliability and Maintainability, 1995, pp. 450–456. [33] S. Renals, R. Rohwer, A study of network dynamics, J. Statist. Phys. 58 (1990) 825–848. [34] T.L. Saaty, The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation, McGraw-Hill, New York, 1980. [35] M. Schneider, E. Shnaider, A. Kandel, G. Chew, Automatic construction of FCMs, Fuzzy Sets and Systems 93 (2) (1998) 161–172. [36] A. Siraj, S. Bridges, R. Vaughn, Fuzzy cognitive maps for decision support in an intelligent intrusion detection system, IFSA World Congr. and 20th NAFIPS Internat. Conf., vol. 4, 2001, pp. 2165–2170. [37] W. Stach, L. Kurgan, Modeling software development project using fuzzy cognitive maps, Proc. 4th ASERC Workshop on Quantitative and Soft Software Engineering (QSSE’04), 2004, pp. 55–60. [38] W. Stach, L. Kurgan, W. Pedrycz, M. Reformat, Parallel fuzzy cognitive maps as a tool for modeling software development project, Proc. 2004 North American Fuzzy Information Processing Society Conf. (NAFIPS’04), Banff, AB, 2004, pp. 28–33. [39] M.A. Styblinski, B.D. Meyer, Signal flow graphs versus fuzzy cognitive maps in application to qualitative circuit analysis, Internat. J. Man Mach. Studies 35 (1991) 175–186. [40] C.D. Stylios, P.P. Groumpos, The challenge of modelling supervisory systems using fuzzy cognitive maps, J. Intel. Manuf. 9 (4) (1998) 339–345. [41] C.D. Stylios, P.P. Groumpos, Fuzzy cognitive maps: a model for intelligent supervisory control systems, Comput. Ind. 39 (3) (1999) 229–238. [42] C.D. Stylios, P.P. Groumpos, Fuzzy cognitive map in modeling supervisory control systems, J. Intel. & Fuzzy Systems 8 (2) (2000) 83–98. [43] C.D. Stylios, P.P. Groumpos, Modeling complex systems using fuzzy cognitive maps, IEEE Trans. Systems Man, Cybern. Part A: Systems Humans 34 (1) (2004). [44] A.K. Tsadiras, Using fuzzy cognitive maps for e-commerce strategic planning, Proc. 9th Panhellenic Conf. on Informatics (EPY’ 2003), 2003. [45] A.K. Tsadiras, K. Margaritis, An experimental study of the dynamics of the certainty neuron fuzzy cognitive maps, Neurocomputing 24 (1999) 95–116. [46] A. Vazquez, A balanced differential learning algorithm in fuzzy cognitive maps, Technical Report, Departament de Llenguatges I Sistemes Informatics, Universitat Politecnica de Catalunya (UPC), 2002. [47] A. Wright, Genetic algorithms for real parameter optimization, Foundations of Genetic Algorithms, Morgan Kaufmann, Los Altos, CA, 1991, pp. 205–218.