| 5 FUZZY CONTROL SYSTEMS 4 control system isan arrangement of physical components designed to alter to regulate, or to nd, through a control action, another physical system so that it exhibits certain desired characteristics OF behaviour. Control systems are typically of two types: open loop control systems, in which the control action is independent of the physical system output, and closed Joop control systems, in which the control action depends on the physical system output. amples of open loop control systems are a toaster, in which the amount of heat is set by a juman, and an automatic washing machine, in which the controls for water temperature spin ge time, and so on are preset by the human. In both these cases the control actions are not a function of the output of the toaster or the washing machine. Examples of feedback control are aroom temperature thermostat, which senses room temperature and activates a heating or cooling unit when a certain threshold temperature is reached, and an autopilot mechanism, System System 5 error] Compensator input e Plane 2222 eS Controller Sensor «———— Fig. 9.14: A closed loop control system ‘Scanned with CamScanner 246 Neural Networks and Fuzzy Logic which makes automatic course corrections to an airplane when heading or altitude deviations from certain preset values are sensed by the instruments in the plane’s cockpit. In order to control any physical variable, we must first measure it. The system for measurement of the controlled signal is called a sensor. The physical system under control is called a plant in a closed loop control system, certain forcing signals of the system are determined by the responses of the system. To obtain satisfactory responses and characteristics for the closed-loop control system, it is necessary to connect an additional system, known as a compensator (on a controller, into the loop. The general form of a closed loop control system is shown in Fig. 9.14. Control systems are sometimes divided into two classes. If the object of the control system is to maintain a physical variable at some constant value in the presence of disturbances, the system is called the regulatory type of control, or a regulator. The room temperature control and autopilot are examples of regulatory controllers. The second class of control systems are tracking controllers. In this scheme of control, a physical variable is required to follow or track some desired time function. An example of this type of system is an automatic aircraft landing system, in which the aircraft follows a “ramp” to the desired touch down point. The control problem is stated as follows. The output or response, of the physical system signal. The error signal is the difference under control is adjusted as required by the error between the actual response of the plant, as measured by the sensor system, and the desired as specified by a reference input. The knowledge-base module in Fig. 9.34 contains knowledge about all the input and output fuzzy partitions. It will include the term set and the control rules. A simple fuzzy logic controller is shown in Fig. 9.37,

Output-Scaling factors, normalization

Fig. 9.37 : Simple fuzzy logic control system block diagram Ttwill include the term setand the c is ‘ : —Load_} Filters Short teom tgndency Rule interence R Yesterday average Siesta Average outdoor temperature Titer fuel Te, Boiler Heat Characteristic Fig. 9.49 : Schematics of the new furnace controller. of validity of the premise is weighted with the individual degree of support of the rule, resulting in the degree of truth for the conclusion. In the third step, all conclusions are combined using the maximum operator. The result of these steps is a fuzzy set. The centre-of-maximum defuzzification method is used to arrive at a real value from a fuzzy output. * Fuzzy controller 7 boiler set temperature Optimal boiler temperature 1 Temperature Conventional controller boiler set temperature oh ¢h 12h eh Sih 30h ooh Bh 48h Time Conventional performance test. Fig. 9.51 After completion of the design of the fuzzy controller and the definition of linguistic variables membership function, and rules, the system was compiled to the target hardware, ie, to 8051 assembly language with this technology, the fuzzy controller only uses 2.1 bytes of ‘Scanned with CamScanner Neural Network Application 267 go internal ROM area. By means of matrix rule representation and online development oe eg te optimization ofa complex fuzzy logic system containing 05 rules was done sgiciently- T he system performance was evaluated by connecting both the conventional eer and the fuzzy controller to a test house. One such example is shown in Fig. 950. on The result of the comparative performance tests showed that the fuzzy controller was ponsive to the actual heat requirement of the house. It was very reactive to sudden heat i demand, like the return of house inhabitants from vacation. Additionally, the ‘aination of the outdoor temperature sensor saved about $ 30 inproduction costs and even cin instalation cost. By setting the boiler set temperature beneath the level typically used are conventional controller in low-load periods, the fuzzy controller actually saved energy. sighly ‘Scanned with CamScanner y cums 64 HETERO ASSOCIATIVE MEMORY NEURAL NETWORK Associative memory neural networks are networks in which the weights are determined in such a way that the net can store a set of P pattern associations. Hetero Jesociative networks ar static networks. No non-linear or delay operations can be done using hetero associative fetworks. The weights may be found using the Hebb rule or the delta rule. The net in this case finds an appropriate output vector that corresponds to an input vector x that can be either one of the stared patterns. Architecture chitecture of Hetero Associati pein Fig. 6.1. ciative neural networks as fj. 6.1 ; Architecture of a Hetero : hitec ‘Associative Neural Net. This architecture resembles a single layer feedforward network as discussed. It consists of one layer of weighted interconnections. There exist ‘1’ number of input neurons in the input layer and ‘m’ number of output neurons in output layer. ‘The training process is based on the Hebb learning rule discussed in section. This is a fully inter-connected networks where the inputs and outputs are different, hence it is called a Hetero associative network. ‘Application Algorithm: The weights of the network are obtained using the training algorithm. These weights are used along with the testing data and the performance of network is tested by applying following procedure: Step 1: Weights are initialized Hebb or delta rule. Step 2 : For each input vector do steps 3 to 5. Step 3 : Set the activations for input layer units equal to current vector x; Step 4: Compute net input to output units. 0 Ying = Yxwy, pe Step 5 : Determine the activation of the output units Lif Yin y= 0, if yin 1, if yiy0 fe) = {a if x<0 Data Representation The data can be represented by (-1, +1) which is called the dipolar representation With b Tepresentation if each our original pattern is a sequence of 1 or 0 response, “Missing 22 Would correspond to a response of 0. Bd To improve recognition two modifications are done. lay lay” First Modification In this binary input and target vectors are converted into bipolar representations for, form: of the weighted matrix. But the input vectors used during testing and the response ofthe are still represented in binary from. This is called hybrid data. eet The weight matrix formed form corresponding bipolar vector is given by = L2S\(P)-1)(24)(P)-1) $ where S(P) = [S,(P), -.., SP), ...8,(P) #(P) = [ty (P), ener (Py oon Eg (PDT where S(P) is input vector i(P) is the target vector. Second Modification All vector, ie., training input, target output, testing input and the response of the net are expressed in bipolar form. The main advantage of using this modification is to differentiate between missing and mistaken data. For the instance if the original pattern is a sequence of ~ 1 and 1, the ‘misng dat” will be 0 and mistaken data will be 1 if original data is -1, and -1 if original data is 1. The missing data is sometimes called a form of noise. In the second modification, all vectors are represented in bipolar form. 6.5 SOLVED PROBLEM IN HETERO ASSOCIATION NET. Example 6.1: A hetero associative net is trained by Hebb outer product rule for input row vector S = (X,, Xz, X,, X,) to output row vectors t = (t,, f,). Find the weights matrix. S,=(1100) #, (10) S,=(1110) 4,01) S,= (0011) (10) S,= (0100) #,(10) Solution: Step 1: Initialise weights to zero. Step 2: Find the outer products for each input S, = (1100), =(10) The outer product of this vector is given by w, = S;'t,

1 1 0 1 10 W, = |o{[I=|5 9 0 00. 1 01 similarly W,=S,"t, — Similarly W, = = 2= |, fl0=}o 1 0 00 0 0 0 0 00 => Wy, = |, |[10] = = 4,[l=|1 0 1 10. 0 0 0 1 10 Wy= 5,7 ty = W,= ot) _ oo 0 00 Step 3: Weight matrix to store the entire four patterns is the sum of the weight matrix for each stored pattern. W = W,+W, + W34 Wy 10) fo 1] fo 0} fo o] fa 2 1 0}, Jo 1) ,Jo 9},11 0 21 w= |o o{*}o 1]*|1 0] jo oj |1 2 0 o| |o o} {1 of fo 9} [2 2

Thus weight product is determined using other products method of Hebb rule. Find the weight matrix and testnetwork Example 6.2: A hetero associative networkis given. with training input vector $,=(1100) #,(10) S,=(0100) , (10) $,=(0011) 5,01) S,=(0010) #,(01) Solution: Step 1: Initialising Hebb outer product rule method 1 10 1 10 west [Flo 0 0 00

W eo00 W, =S," ty => =e HRHoo W,=Sh => fa aS} co flay " coococ ccoaoad FerS oHoo eo) Weight, W = W,+W,+W; First Training Patterns: Step 2 : For first training pattern X, follow Steps 3 to 5. 4 Yoing = DXMy 5 Vint = X{Wyy + Wyy +X,Wyy +X,W, = 1x1+2x1+0x0+0x0 = 14+2+040=3 = XWyp + XWyy + XyWoy +X, Wy = 1x0+1x0+0x24+141x0, Vain Step 5: Calculate y, = fy ,,) 3) =1 s fle) =1,ifx>0 wos Ya = LY ing) = f(0) =0 =0,ifx= This is correct response for frst pattern

Second Trianing Pattern Step 2: For this also repeat 3 to 5, (0100) Hein = X,Wy +X Woy + XW, +X,Wy X141x240x040x0=2 Hina = X; Wag + X4Wy9 + XWoy + X,Wyp XOF1X042x040x step 5: Calculate % y. 'This is correct response for second training pattern, third Training Pattern 3 = (001) Calculate ¥.j4; Yan = XyWyy +X; Wo, + XWay + X,W yy OK140% 241 «041x020 XiWhyt XW, + XWyy + XW yy Ox040x04¢1 241x123 My FAY in) = 0) = 0 Y= fly. 1 Fourth Training Pattern (0010) XyWa + XWay + XWs, + XW 0x140x2+0x1+0x0=0 = X\Wyy + XWay + XWyy + X,Wyp Ox0+0x0+1%2+0x%1=2 Thus weight matrix is determined by input pattern itself. Example & 2+ Tact shaw ov The ability of the net to reproduce a stored pattern form a noisy input is used to determine the performance of the net. The performance of the net is better for bipolar vectors than the binary vectors. It is important to note that in auto associative net the weights on the diagona} are set to zero (weights with no self-connection). The diagonal component are those, which connect an input pattern component to the corresponding component in the output pattem, Setting the weights to zero improves the net's ability to generalise or increase the plausibility of the net. If may also increase the biological plausibility of the the net. If delta rule is used, then it is used to prevent the training from producing the identity : matrix for the weights. 6.6.1. Architecture Figure 6.2 shows the architecture of an auto associative neural net. This architecture resembles a single layer feed forward network as discussed in section. It consists of only one layer of weighted interconnections. There exists ‘n’ number of input neurons in the input layer and ‘n’ number of output neurons in the output layer. The training process is based on the Hebb learning rule discussed in section. This is a fully interconnected network, where in the inputs and the Fig. 6.2. Architeciedf outputs are same, hence called an auto associative network. ‘an auto associative ‘Scanned with CamScanner a “Associative Memory Network 131 rameters wed ‘The parameters used in this net are as follows: sr) sample vector (S(P), S,(P), ... S,(P)) X Input vector (Xj (P), XP), .t,(P)) pt vector (VP), YP) HP) feights between j* output unit and * input unit 62 Taining Algorithm sritially, the weights of the auto associative network are taken as zero. This is patterns associative gupervised learning network where in both the inputs and outputs are known, a5 ® result the vations are set for both the input and output units. Then the final weights are calculated don the Hebb learning rule as discussed in Section 61. The algorithm is given below. Note that the output units and the input units are the same. Step 1: Initialize the all weights, f21,2.00,f=1, 2000 .=0; = OF Step 2: For each vector to the stored follow steps 3 to 4 XS; Step 3: Set activation for each input unit = 1). x,=5; Step 4: Set activation for each output unit j= 1... y= Sr Step 5: Adjust the weight for /= 1 ton and j=1 ton (new) = wy (old) + Xi; ‘The weight can also be determined from Hebb learning as we Fsts@)- Pa 663 Application Algorithm Based on the application procedure only the performance of the network is evaluated. The stepwise application algorithm is given below. ‘This is purely based on the calculation of a net input for the final weights. obtained an Step 1: Initialize the weights For each testing input follow steps 3 t0 5. Set the activation of the input units equal to the input vector. Solution: (2) Step 1: To initialise the weight using outer product Hebb rule. w = ST(P)CP) S(P) = [11-1-1] WP) = [11-1-1] 1 al w= |_,|@1-1-) -1 Te 1.417 1. 4)-1.-1 "li -11 Yq byare ty (#) Step 2 : For the testing input vector perform steps 3 to 5. Step 3: X = [11-1-1] Step 4: Yin = Ly 1 1-1-1 1-1 -1 = Mata) p pig a -1-111 Yq = 414141 1414+141 -A--1-a aA-1-1-] [44-44] Step 5: y = fl44—4-4]=[11-1-1] The Tesponse vector y is same as the stored vector, we can say that the input vector is eognized as a "known" vector. (c) An auto associative net with one mistake in the input vector Miia i : 44 ta 77 Lisd eee a eee . A222 eT! Waa 1acaay ' 11a 4 @naaypft 14 4 a I-31 1 -4 1 4 Flviered T-14141 44 "Utd -a41_4_ Al22-2-2) a4 141-144] : Piast Pras "a @m11-; £ 1] 14.4 L A414 T+1-147 aU Ae Der dae ~1-141-+4 Al2 2-2-2) = (11-4 4] o} Tias 11-44 (4) [111-1] er 1-1-1 4 (1+1 +1-1 1+1+1-1 -1-1-141 ~1-1-14) Al2 2-2-2) = [1 1-1-1) Note that in each of these cases input vector is Tecognised as "known', @) Consider an auto associative network with ore Component as missing, These ves are 1144 igi Misys 41-4611 (0+14+1+1 0414141 0-1-1-1 0-1-1-1) AI3 3 -3 -3]) = [1 1-1-1] 1 1-1-4 1-4-1 @ Moray, 7 yy a411 (1+041+1 1+0+41+1 -1+0-1-1 -1-0-1-1] A(3.3 -3 -3]) = [1 1-1-1]

11iat 11-44 @ G10 4 y ; a411 peiti+0 14141401 -1-1+0 -1-1-1+0] fG3-3-3) > 1-1-1) 1iact ria @ fi7zog], | aAai1i1 [ie141+0 1414140 4-14+0 -1-1-1+09] fG33-) 30 1-1-1] 'Thus the net recognizes the vectors formed when one component is missing. Consider an auto associative network with two "missing entries" in the input vector 1-1-1) : 1144 1iaa Aaa11 a411 fO+0+14+1 0+041+1 0+0-1-1 0+0-1-1] fl22-2-3) = [1-1-1] 11at 14-1 AV sb died 4a401 1 [O+¥1+0+1 O+1+0+1 0-140-1 0-14+0-1] fl22-2-2)) > [11-1-1] 11a 1aa adi1.1 41°51 1 [0+1+14+0 0414140 0-1-1+0 0-1-1+0] Al22-2-2}) => [1-1-1] 11act 14a 4aai11 a411 1404041 1404041 -1+0+0-1 -4+0+0+1) Al22-2-2)) > [11-1-1] @ [0-1-1 (010-1) (3) [01-10] (4) {100-1 114-4 7 iba © WO 4 A a aaii (1+04+1+0 14+0+1+0 -1+0-1+0 -1+0-1+0] S(2 2-2 -2]) = [11-1-1] 11-44 114-4 (6) (1100) 44 -1-1 141 (1+1+0+0 1414040 -1-14+0+0 -1-14+0+0] Sf(22-2-2}) > [1 1-1-1] Tt is found that even in the case of two missing entries, the net recognizes the vectors formed. Consider on auto associative net with two "mistakes" in the input vector

1A -1 1 -1-1- a4 a-111 (114141 -1-14141 141-1-1 141-1-1] [0000] Thus the net does not recognized this input vector, which is found to have two mistaken components.

From the above example we can conclude that a net is more tolerant with "missing data" than that of "mistakes". Since the vectors with missing data are closer to the stored vectors than with "mistakes" data.

Example 6.10: Consider an auto ascaciative net with the bipolar input This transition among various degrees of membership can be thought to as con’ feeming te the eS that the boundaries of the fuzzy sets are vague and ambiguous. Hence membership of an element from the universe in this set is measured to a function that attempts to describe vagueness and ambiguity. Definition A fuzzy set, then, is a set containing elements that have varying degrees of membership in the set. This idea is in contrast with classical or crisp, set because members of a crisp set would not be members unless their membership was full, or complete, in that set (ie., their membership is assigned a value of 1). Elements of a fuzzy set are mapped to a universe of membership values using a function theoretic form. The fuzzy sets are denoted in this text by a set symbol with a tilde understrike so for example, A would be the fuzzy set A. This function maps elements of a fuzzy set A toa real numbered value on the interval 0 to 1. If an element in the universe, say x, is a member of fuzzy set A, then this mapping is given by j1 A(x) =€ [0, 1]. This mapping is shown in Fig. 7.17 for a typical fuzzy set. ‘A notation convention for fuzzy sets when the universe of discourse X is discrete and finite, is as follows for a fuzzy set A: X Xy X; A= {iss a) , Hal a) | aly Hale) yy % 4 When the universe, X, is continuous and infinite, the fuzzy set is A is denoted by AF [es Example 7.10: Let X= (gy 8/84, Sy 8s} be the reference set of students. Let A be the fuzzy setof “sma” students, where “smart” is a fuzzy linguistic term. tl ‘Scanned with CamScanner A= lev 04} G05) 1) @,, indicates that th i | indica : e smartness of gis 0.4, is 05 when grade ies : scale of 0~1, 008 gh fuzzy sets model vagueness, it needs tobe reali Fa Tho on of the sets varies according to the eae : ptt oused: Thus, the fuzzy linguistic term “tals could have ee fuzzy set while referring to the height of Classical and Fuzzy Set 7 95) @, 0.8)) x : I f a buildin, Fig. 7.17 : Membershi tr Hnd Of fazzy set while referring to the mages forficnya nal beings: se " MEMBERSHIP FUNCTION 9 ip functi a membership function values need Rot always be described by di i tum out fo be as described by a continuous function, “SCE Values. Quite often, Oe The fuzzy membership function i for the fuzzy linguistic term “cog” pemay turn out to be as illustrate in Fig. 7.18, Telating to tempera- Temperature PPTs Fig. 7.18 : Continuous Membership Function for “Cool”. Fig. 7.19 : Continuous Membership Function Dictated by a Mathematical Function. Amembership function can also be given mathematically as Hal) = ‘The graph is as shown in Fig. 7.19. i " Different shapes of membership functions exist. The shapes could be triangular:tray 2eidal, curved or their variations as shown in Fi ' 1 (+x? pe- ig. 7.20. wa) wa Fi aN 1 VW. x X > xX x Fig. 7.20 : Different Shapes of Membership Functions Graphs. Example 7.11: Consider the set of people in the following age groups _A ‘Scanned with CamScanner 178 Neural Networks and Fi Logic 0-10 40-50 10-20 50-60 ; {Young Middle-aged 20-30 60-70 OW 30-40 70 and above The fuzzy sets “young”, “middle-aged”, and “old” are represented by the membership function graphs as illustrated in Fig. 7.21. 7.10 FUZZY SET OPERATIONS 10 20 30 4050-5 Given X to be the universe of discourse and A and B to Fig. 7.21: Examples of Fuzzy . ; ‘ets Expressing “Young”. be fuzzy sets with Ha (x) and Hg (x) as their respective Expressing “Young”, membership functions, the basic furzzv set anerations are as follows: ‘Scanned with CamScanner \ | geo : ‘ | | NEURAL NETWORKS IN FORECASTING forecasting techniques used today are based itional li . . | | ee regression analysis, Although there edition linear ornon-linear statistical itu iti ‘i ss onc markets it sa bocoming ae Rely important tobe able to more quickly and accurately Ip edict ends ane i nety inapalt a a order to maintain competitiveness, More specifically, itis y |“ pecoming e r forecasting models today to be abk 1 | mr ationships while allowing for high levels of uicceee ee v r ‘oisy data and chaotic components. | Inthe light of changing business environments, managers are feeling the need more flexible forecasting models. Specifically, forecasting models need to better allow for processing large amounts of data originating from many sources and many locations. Currently, however, most of the processing of information that occurs in business is done by utilizing computers that | process information sequentially from a single control processing unit. This processing unit | generates a result directly from the hard coding of the problem into the computer by a | programmer. Due to need for improved and adrenced processing, business have turned their focus to the idea that there is the potential for informaion processing to take place through | mechanisms other than traditional models. | One specific method of information processing that is being focused upon today utilizes | asystem that mirrors the organization and structure of the human nervous system. Such a system would consist of “parallel” computers equipped with multiple processing elements ‘ aligned to operate in parallel to process information. This method of forecasting is referred to | asneural networking and is accomplished through a forecasting tool that has become known as an artifical neural network (ANN). Such networks have been used in the modical, science and engineering fields. Operation of a Neural Network A neural network is an information processing model based upon the functioning of neurons, or nerve cells, fund in the human brain and nervous system, researchers are able to | Meate on information processing system for use in forecasting that operates in the same manner as the human nervous system. In a neural network model, information is processed through interconnected networks that work to transfer information through a signaling process. The Neural networks are typically mode up of three layers of neurons; input layers, hidden layers and output layers. Also within this model are assigned weights representing the knowledg ase of the system anid a transfer function that is used to process the data and represents th Non-linear properties of the neuron. ‘Scanned with CamScanner 240 Neural Networks and F y Logie and the transfer function make up the Although these three layers, the assigned weights ier typical neural network structure, there are many choices when designing such a network for example, all layers and neurons can vary in number depending, on the characteristics, such as amount and nature of the data to be used and the desired output result. More speci ically, although using one hidden layer adequate to map any input/output relationship, it may not be the most appropriate design in certain forecasting cireumstances. This is also the case with the output layer, ‘The input layer is the first layer of neurons and is where all of the known external variables and data is input, Each input neuron represents a separate variable and various line connectors join these input layers with the middle, or hidden layers of the network. The connectors are assigned weights according to the level of importance since they are the location of the knowledge pool that exists within the network. The basic process that occurs between the input and hidden layers begins with the inputting of the external data. The values of this input data are then multiplied by the appropriate weights, as determined by the back propagation algorithm, and summed within the hidden layer. This sum is then converted through a transfer function into an output value. This output value is in the last layer, known as the output layer and typically contains only one neuron since only one ouput is usually requested. One output request is typically recommended because determining more than one output has proven to generate less than desirable results. Since the neural network operates to process data in the same manner as the human brain functions, the neural network needs the ability to “learn” information as opposed to being “programmed”, This learning ability of the neural network is accomplished through intense training of the network by providing it with numerous, reliable, and correct examples. During the training phaso, the overall goal is to determine the most accurate weights to be assigned to the connector lines. Also during training, the output is computed repeatedly and the result is compared to the preferred output generated by the training data. Any variance is considered a training error and itis important for this training errors to be as small as possible so that the forecasted output is reliable. In order to minimize this error, the originally assigned weights are adjusted until the error declines. This weights adjustment is accomplished through the use of algorithm, By adjusting the weights, the error is minimized continually until a point is reached that represents the least amount of a acceptable error. At this point an accurate forecast can be produced. Advantage of Neural Networks The purpose of using neural networks is to be able to forecast data patterns that are too complex for the traditional statistical models. Although neural networks are quickly becoming the wave of the future for forecasting, they continue to have both advantages and disadvantages. A strong advantage of neural networks is that, when properly trained, they can be considered erts with regard to the particular output project for which they were designed to examine. network structure can even be used to “provide projections given new situations and answer “what if” questions.” In addition, the learning ability of neural networks allows them to adjust to dynamic and changing market environments and is a much more flexible forecasting tool than traditional statistical models. An example of this level of flexibility is in the area of forecasting net asset values of mutual funds. When compared to regression analysis, neural network forecastng in this area was shown to generate a 40% increase in accuracy, This higher performance level is mainly attributed to the flexibility of the neural network and the ability to take into accountall ‘Scanned with CamScanner 9.4 NEURAL NETWORKS IN CONTROL The neural networks can be act as NN controle: i estimator for parameter estimation. r and can be used in the feedback path as NN. The applications dealt here are application of neural networks for speed control of induction motor, application of neural networks in PH control in neutralization reaction and finally application of balancing of broomstick. Inthe speed control of induction motor, field vector control method is used and the BPN is used to find the error between actual speed and the desired speed. In the next application BPN network is used for the controller and the parameter estimation of the neutralization process. When compared with the conventional p/n controller, neural network employing adaline network provides the best control action for balancing the bloomstick. Basic Concept of Control Systems Ina system when the output quantity is controlled by varying the input quantity, the system is called control system. The ouput quantity is called command signal or excitation. (@ Open Loop System. (b) Closed Loop System. ‘Scanned with CamScanner 242 Neural Networks and Fuzzy Logic (a) Open Loop System Output Input P Plant =--S i (Any physical system that does not automatically correct ; in its output is called an open loop system. The a ee systems are inaccurate and the changes in the output Fig. 9.7 on ae ‘are not corrected automatically. The changes in the output are ane fed by changing theinput manually. The block diagram of the open loop systems shown, (b) Closed Loop system Control system is a system in which the output has effect upon the input quantity in such a manner as to maintain the desired output value are called closed loop systems. The open loop system is modified asclosed loop Reference Error detector system by providing of feedback. The Output provision of feedback automatically “>| Controller }->[ Plant we corrects the change in the output due * to the disturbances. Hence the closed Joop system also called as automatic control system. Feedback |__| ‘The general block diagram of the sensor osed loop control system is sh d op jystent 1S ayn Fig. 9.8 : Closed Loop Control System in the Fig. 9.8. It consists of an error detector, a controller, plant and feedback path elements. ‘The reference signal corresponds to desired output. The feedback path elements sample the output and convertit toa signal of same type asthe reference signal. The feedback signalis proportional to the output signal and is fed to the error detector. The error signal generated by error detector is the difference between the reference signal and the feedback signal. The controller modifies and amplifies the error signal to produce control action. The modified error signal is fed to the plant to correct ts output. The closed loop systems are less affected by noise. ‘The closed loop systems are accurate even in the presence of non-linearities. The neural network is used in the following block. o Error detector. Controller @ Feedback path Application of Neural Network in Control Systems ‘The applications included are: ae PP eal network based controller for induction motor. In this application the neural networks are used as controller to control the speed of the induction motor. «5 Neural network control of the neutralization system. In this application the PH value js used to control the neuralization of the system. The pH value is controlled by controlling the flow rate of the base. e Neural network based controller for position and angle of broomstick is use his application the broomstick balancer. In tl f the broomstick. .d to control the balance of Motor .d control of induction motors “The generalized block diagra Neural Network Based Controller for Induction neural networks in the spe tasks in the control system area. rs has emerged The application of mofthe as one of the challenging “SMA en yer nove ‘Scanned with CamScanner | THE DYNAMICAL-SYSTEMS APPROACH TO MACHINE INTELLIGENCE 13 Neural and Fuzzy Systems as Function Estimators eae a fuzzy systems estimate input-output functions. Both are NURI aa ae Sample data shapes and “programs” their time evolu- model of how out al cae they estimate a function without a mathematical pean puts depend on inputs, They are model-free estimators. They irom experience” with numerical and, sometimes, linguistic sample data. Neural and fuzzy systems encode sampled information.in a parallel-distributed framework. Both frameworks are numerical. We can prove theorems to describe their behavior and limitations, We can implement neural and fuzzy systems in digital or analog VLSI circuitry or in optical-computing media, in spatial-light modulators and holograms. ____. Attificial neural networks consist of numerous, simple processing units oF neurons” that we can globally program for computation. We can program or train neural networks to store, recognize, and associatively retrieve patterns or database entries; to solve combinatorial optimization problems; to filter noise from mea- surement data; to control ill-defined problems; in summary, to estimate sampled functions when we do not know the form of the functions ‘The human brain contains roughly 10! or 100 billion neurons [Thompson, 1985}. That number approximates the number of stars in the Milky Way Galaxy, and the number of galaxies in the known universe. As many as 10* synaptic junctions may abut a single neuron. That gives roughly 10'S or 1 quadrillion synapses in the human brain, The brain represents an asynchronous, nonlinear, massively parallel, feedback dynamical system of cosmological: proportions. ‘Artificial neural systems may contain millions of nonlinear neurons and inter- connecting synapses. Future artificial neural systems may contain billions of real or virtual model neurons. In general no teacher supervises, stabilizes, or synchronizes these large-scale nonlinear systems. Many feedback neural networks can leam new patterns and recall old patterns simultaneously, and ceaselessly. Supervised neural networks can leam far more input-output pairs, or stimulus-response associations, than the number of neurofs and synapses in the network architecture. Since neural networks do not use a math- ematical model of how a system's output depends on its input—since they behave ge model free estimators—we can apply the same neural-network architecture, and dynamics, to a wide variety of problems. Tike brains, neural networks recognize patterns we cannot even define, We call this property recognition without definition. Who can define a tees 8 pillow, or Their own face 10 the satisfaction of a computer pattern-tecognition system? These rand most concepts we lean astensively, by pointing out examples. We do not eam them as we lear the definition of a circle. We abstract these.concepts from sample data, just as a child abstracts the color red from observed red apples, red wagons, watt Qiher red things, or as Plato abstracted triangularity from considered sample triangles. 7 Recognition without definition characterizes much intelligent behavior Iten- ‘Scanned with CamScanner 14 NEURAL NETWORKS AND FUZZY SYSTEMS = CHAP. 1 ize, i .d slugs recognize multitudes of unfore- te to generalize. Dogs, lizards, an¢ S| nfor nae amie patieras without, of course, any ability to define them. Descriptive terday in human evolution. Yet a great deal natural languages developed only yester: 7. ite and behaviorist echelon, a f modem philosopliy, influenced by formal logic 0 i insisted on ‘eoncept definition preceding recognition or even ee we discuss how this insistence has helped shape the. field of artificial intellig its emblem; the expert system. pe ; args Neural networks store pattem or function information with distributed en- coding. They superimpose pattern information on the same associative-memory medium—on the many synaptic connections between neurons. Distributed encoding enables neural networks to complete partial patterns and “clean up” noisy patterns, So it helps neural networks estimate continuous functions. 7 Distributed encoding endows neural networks with fault tolerance and ‘grace- ful degradation.” If we successively rip out handfuls of synaptic connections from a neural network, the network tends to smoothly degrade in performance, not abruptly fail. Computers and digital VLSI chips do not gracefully degrade when their com- ponents fail. Natural selection seems to have favored distributed encoding in brains, at least in sections of brains. Neural networks, and brains, pay a price for distributed encoding: crosstalk. Distributed encoding produces crosstalk or interference between stored patterns. Similar patterns may clump together. New patterns may crowd out older learned patterns. Older patterns may distort newer patterns, Crosstalk limits the neural network’s storage capacity. Different learning schemes provide different storage capacities. The number of neurons bounds the number of patterns a neural network can store reliably with the simplest unsuper- vised learning schemes. Even for more sophisticated supervised learning schemes, storage capacity ultimately depends on the number of network neurons and synapses, as well as on their function. Dimensionality limits capacity. Biological neurons and synapses motivate the neural network’s topology and dynamics, We interpret neurons as simple input-output functions, threshold switches for two-state neurons and asymptotic threshold switches for continuous neurons. We Pee adjusable oe In neural analog VLSI chips [Mead, 1989], Te wel mv ers cee ee ae mercial adaptive estimators are si Ss an adaptive function estimator. Indeed, oy imple, usually linear, neural networks. These. in- clude antennae beam formers, high-speed distance telephone calls, igh-speed modems, and echo-cancellers for long- Scanned with CamScanner 8 NEURAL NETWORKS AND FUZZY SYSTEMS. CHAP. 1 Fuzzy Systems and Applications Fuzzy systems store banks of fuzzy associations or common-sense “rules.” A fuzzy traffic controller might contain the fuzzy association “If traffic is heavy in this direction, then keep the light greeen longer.” Fuzzy phenomena admit degrees. Some traffic configurations are heavier than others. Some green-light durations are longer than others. The single fuzzy association (HEAVY, LONGER) encodes all these combinations. Fuzzy systems are even newer than neural systems. Yet already engineers have successfully applied fuzzy systems in many commercial areas. Fuzzy systems “intelligently” automate subways; focus cameras and camcorders; tune color televi- sions; control automobile transmissions, cruise controllers, and emergency braking systems; defrost refrigerators; control air conditioners; automate washing machines and vacuum sweepers; guide robot-arm manipulators; invest in securities; control traffic lights, elevators, and cement mixers; recognize Kanji characters; select golf clubs; even arrange flowers. Most of these applications originated in Japan, though fuzzy products are sold and applied throughout the world. Until very recently, Western scientists, engineers, and mathematicians have overlooked, discounted, or even attacked early versions of fuzzy theory, usually in favor of probability theory. Below, and especially in Chapter 7, we examine this philosophical resistance in more detail and present a new geometrical theory of multivalued or “fuzzy” sets and systems. Fuzzy systems “reason” with parallel associative inference. When asked a question or given an input, a fuzzy system fires each fuzzy rule in parallel, but to different degree, to infer a conclusion or output. Thus fuzzy systems reason with sets, “fuzzy” or multivalued sets; instead of bivalent propositions. This generalizes the Aristotelian logical framework that still dominates science and engineering. In one second a digital fuzzy VLSI chip may execute thousands, perhaps millions, of these parallel-associative set inferences. We measure such chip performance in FLIPS, fuzzy logical inferences per second. ! Fuzzy systems estimate sampled functions from input to output. They may use linguistic (symbolic) or numeric samples. An expert may articulate linguistic associations such as (HEAVY, LONGER). Or a fuzzy system may adaptively infer and modify its fuzzy associations from representative numerical samples. In the latter case, neural and fuzzy systems naturally combine. The combina- tion resembles an adaptive system with sensory and cognitive components. Neural Parameter estimators embed directly in an overall fuzzy architecture. Neural net- works “blindly” generate and refine fuzzy rules from training data. Chapters 8 through 11 describe and illustrate these adaptive fuzzy systems. Adaptive fuzzy systems leam to control complex processes very much as we do. They begin with a few crude rules of thumb that describe the process. Expefts may give them the rules. Or they may abstract the rules from observed expert behavior. Successive experience refines the rules and, usually, improves performance. ‘Scanned with CamScanner NEURONS AS FUNCTIONS A Neurons behave as functions. Neurons transduce an unbounded input activa- ion 2(t) at time ¢ into a bounded output signal 5(zx(t)). Usually a sigmoidal or S-shaped curve, as in Figure 2.1, describes the transduction. A sigmoidal curve also describes the input-output behavior of many operational amplifiers. For instance, the logistic signal function 1 8) = Tyee is sigmoidal and strictly increases for positive scaling constant ¢ > 0. Strict mono- tonicity implies that the activation derivative of S is positive: v= s = cS(1-S) > 0 The threshold signal function (dashed line) in Figure 2.1 illustrates a nondif- ferentiable signal function. In general, signal functions are piecewise differentiable, The family of logistic signal functions, indexed by c, approaches asymptotically the 39 ‘Scanned with CamScanner CHAP. 2 = NEURONAL DYNAMICS |: ACTIVATIONS AND ‘SIGNALS a Sex) + x FIGURE 2.1 Signal S(z) as a bounded monotone-nondecreasing function of activation z. Dashed curve defines a threshold signal function. threshold function as c increases to positive infinity. Then S transduces positive ac- tivations z to unity signals, negative activations to zero signals. S would transduce the four-neuron vector of activations (—6 350 49 —689) to the four-dimensional bit vector of signals (0 1 1 0). A discontinuity occurs at the zero activation value, which equals the signal function’s “threshold.” We can arbitrarily transduce zero activations to unity, zero, or the previous signal value. Zero activation values occur less frequently in networks with many neurons. ‘Scanned with CamScanner NEURONAL DYNAMICAL SYSTEMS We describe the neuronal dynamical system by a system of first-order differ- ential or difference equations that govern the time evolution of the neuronal activa- ‘Scanned with CamScanner NEURONAL DYNAMICAL SYSTEMS 45 tions or membrane potentials. Different differential equations govern the synaptic dynamical system, as we discuss in Chapters 4 through 6. For the fields Fx and Fy, we denote the activation differential equations as a = gi(Fx, Fy...) (2-1) Bn = gn(Fx, Fy...) (2-2) th = h(Fx, Fy, ...) (2-3) tp = hy(Fx, Fy...) (2-4) or, in vector notation, % = BFx, Fy...) (2-5) y = h(Fy, Fy...) (2-6) where x; and y; denote respectively the activation time functions of the ith neuron in Fy and the jth neuron in Fy. The arguments of g; and h; functions also include synaptic and input information. ‘We do not include time as an independent variable. As a result, in dynamical systems theory, neural-network models classify as autonomous dynamical systems. Nonautonomous dynamical systems might allow the change in activation x; to de- pend, say, additively on ¢?. Autonomous systems are usually easier to analyze than nonautonomous systems. Time does play a special role in neuronal dynamics: time is “fast” at the neuronal level. In mammalian neural systems, membrane fluctuations occur at the millisecond level. In hardware or computer implementations of neural networks, neuronal fluctuations can in principle occur at the nanosecond level. In contrast, time is “slow” at the synaptic level. In mammalian neural systems, synaptic fluctuations occur at the second or minute level. We think faster than we learn. Note the absence of second-order time derivatives (accelerations) and partial derivatives in (2-1) through (2-6). This represents one distinction between neural- network models and the models of classicai “neural modeling,” where often the differential equations give a detailed, multivariable description of how individual neurons or synapses behave. ‘Scanned with CamScanner LEARNING AS ENCODING, CHANGE, AND QUANTIZATION Learning encodes information. A system learns a pattern if the system encodes the pattem in its structure. The system structure changes as the system learns the information. We may use a behavioristic encoding criterion. Then the system has learned the stimulus-response pair (x;, yi) if it responds with y; when x; stimulates the system. x; may represent a spectral distribution, and y; a pattern-class label. Or x; may represent the musical notes in a bar of music text, and y; the corresponding control vector of depressed piano keys. The input-output pair (x:, y;) represents a sample from the function f: R® — RP. The function f maps n-vectors x to p-vectors y. The system has leamed the function f if the system responds with y, and y = f(x), when x confronts the 411 ‘Scanned with CamScanner 2 SYNAPTIC DYNAMICS |; UNSUPERVISED LEARNING — CHAP, 4 m has partially learned, or approximated, the function with y’, which is “close” to y = f(x), when presented system maps similar inputs to similar outputs, and so system for all x. The s' £ if the system respond with an x! “close” to x, Th estimates a continuous function. Leaming involves change. A system learns or adapts or “self-organizes” when sample data changes system parameters. In neural networks, learning means any change in any synapse. We do not identify learning with change in a neuron, though in some sense a changed neuron has leamed its new state. Synapses change more slowly than neurons change. We lear more slowly than we “think.” This reflects the conventional interpretation of learning as semi- permanent change. We have learned calculus if our calculus-exam-taking behavior has changed from failing to passing, and has stayed that way for some time. In the same sense we can say that the artist's canvas learns the pattern of paints the artist smears on its surface, or that the overgrown lawn leams the well-trimmed pattem the lawnmower imparts. In these cases the system learns when pattern stimulation changes a memory medium and leaves it changed for some comparatively long stretch of time. Leaming also involves quantization. Usually a system learns only a small proportion of all patterns in the sampled pattem environment. The number of possible samples may well be infinite. The discrete index notation “(x;, y.)” reflects this sharp disparity between the number of actually learned pattems and the number of possibly learned patterns. In general there are continuum many function samples, or random-vector realizations, (x, y). Memory capacity is scarce. An adaptive system must efficiently, and con- tinually, replace the patterns in its limited memory with patterns that accurately represent the sampled patter environment. Learning replaces old stored pattems with new patterns. Learning forms “internal representations” or prototypes of sam- pled patterns, Learned prototypes define quantized patterns. Neural network models repre- sent prototype patterns as vectors of real numbers. So we can alternatively view learning as a form of adaptive vector quantization (AVQ) (Pratt, 1978]. The AVQ perspective embeds learning in a dynamic geometry. Leamed pro- totype vectors define synaptic points m, in some sufficiently large pattem space R". Then the system learns if and only if some point m; moves in the pattem space R", The prototypes m; gradually wiggle about R" as leaming unfolds. Figure 4.1 illustrates a two-dimensional snapshot of this dynamical n-dimensional process. Vector quantization may be optimal according to di iteria. The proto- types may spread themselves out so as to minimize the mean-squared error of vector quantization or to minimize some other numerical performance criterion. More gen- erally, the quantization vectors should estimate the underlying unknown probability distribution of patterns. The distribution of prototype vectors should statistically resemble the unknown distribution of patterns. Uniform sampling probability provides an information-theoretic criterion for an optimal quantization. If we choose a pattern sample at “random’—according to ‘Scanned with CamScanner Supervised and Unsupervised Learning in Neural Networks ‘The distinction between supervised and ‘unsupervised learning depends on in- formation. The learning adjectives “supervised” and “unsupervised” stem from pattern-recognition theory. The distinction depends on whether the leaming algo- rithm uses pattern-class information. Supervised learning uses pattern-class infor- mation; unsupervised learning does not. An unknown probability density function p(x) describes the continuous dis- tribution of patterns x in the pattern space R". Learning seeks only to accurately estimate p(x). The supervision in supervised learning provides information about the pattern density p(x). This information may be inaccurate. The supervisor assumes a pattern-class structure and perhaps other p(x) prop- erties. The supervisor may assume that p(x) has few or equally likely modes or maximum values. Or the supervisor may assume that p(x) does not change with time, that random quantities are stationary in some sense. Or the supervisor may assume that p(x)-distributed random vectors have finite covariances or resemble Gaussian-distributed random vectors. Unsupervised learning makes no p(x) as- sumptions. It uses minimal information. Supervised learning algorithms depend on the class membership of each train- ‘Scanned with CamScanner 114 SYNAPTIC DYNAMICS |: UNSUFERNIO Se MAAR, 4 ing sample x. Suppose the disjoint pattern or decision classes D1, ..., Dy exhaus. tively partition the pattem space R”. Then pattern sample x belongs to some class D; and not to any other class Dj: x € D; and x ¢ D; for all i # 3. (With zero probability x can belong to two or more classes.) ; . Class-membership information allows supervised learning algorithms to detec, pattern misclassifications and perhaps compute an error signal or vector. Enor information reinforces the learning process. It rewards accurate classifications ang punishes misclassifications. Unsupervised learning al “blindly” process the pattern samp! have less computational complexity and less d algorithms. Unsupervised learning algorithms lear rapidly, often on a single pass of noisy data. This makes unsupervised learning practical in many high-speed real-time environments, where we may not have enough time, information, or computational precision to use supervised techniques. a We similarly distinguish supervised and unsupervised learning in neural net- works, Supervised learning usually refers to estimated gradient descent in the space of all possible synaptic-value combinations. We estimate the gradient of an unknown mean-squared performance measure that depends on the unknown probability den- sity function p(x). The supervisor uses class-membership information to define a numerical error signal or vector, which guides the estimated gradient descent. Chap- ter 5 presents supervised and unsupervised (competitive) learning in the statistical framework of stochastic approximation. Unsupervised synaptic learning refers to how biological synapses modify their parameters with physically local information about neuronal signals. The synapses do not use the class membership of training samples. They process raw unlabelled data. Igorithms use unlabelled pattern samples. They le x. Unsupervised learning algorithms often accuracy than supervised Teaming Unsupervised learning systems adaptively cluster patterns into clusters or de- cision classes Dj. Competitive learning systems evolve “winning” neurons in a neuronal competition for activation induced by randomly sampled input pattems. Then synaptic fan-in vectors tend to estimate pattem-class centroids. The centroids depend explicitly on the unknown underlying pattern density p(x). This illustrates why we do not need to learn if we know p(x). Then only numerical, combinatorial, or optimization tasks remain. Learning is a means to a computational end. Other unsupervised neural systems evolve attractor basins in the pattern state space R". Attractor basins correspond. to pattem classes. Feedback dynamics al- locate the basins in width, position, and number. Chapter 6 studies unsupervised basin formation in the ABAM and RABAM theorems. First-order difference or differential equations define unsupervised leaming laws. In general, stochastic differential equations define unsupervised learning laws. The differential equations describe how synapses evolve with locally available in- formation. Local information is information physically available to the synapse. ‘The synapse has access to this information only briefly, Local information usually ine ‘Scanned with CamScanner FOUR UNSUPERVISED LEARNING LAWS Ud volves synaptic properties or neuronal signal properties. In mammalian brains or opto-electronic integrated circuits, synapses may have local access to other types of information: glial cells, inter-cellular fluids, specific and nonspecific hormones, electromagnetic effects, light pulses, and other “interference” sources. We shall lump these phenomena together and model them as net random unmodelled ef- fects. These noisy unmodelled effects give a Brownian-motion nature to synaptic equilibria but usually do not affect the structure of global network computations. Biological synapses learn locally and without supervision. Neuroscientist Richard Thompson [1986] sumiarized biological synaptic learning as follows: “All evidence to date indicates that the mechanisms of memory storage are local and do not involve the formation of new projection pathways. Local changes could include the formation of new synapses, structural and chemical alterations in neurons and synapses, and alterations in membrane properties that influence functional properties of preexisting synapses.” Locality allows asynchronous synapses to learn in real time. The synapses need not wait for a global error message, a message that may never come. Sim- ilarly, air molecules vibrate and collide locally without global temperature infor- mation, though their behavior produces this global information. Economic agents locally modify their behavior without knowing how they daily, or hourly, affect their national interest rate or their gross national product. Locality also shrinks the function space of feasible unsupervised learning laws. Synapses have local access to very limited types of information. Learning laws contain only synaptic, neuronal, and noise (unmodelled effects) terms. Associativity further shrinks the function space. Neural networks associate patterns with patterns. They associate vector pattern y with vector pattem x as they learn the association (x, y). Neural networks estimate continuous functions f: X —+ Y. More generally they estimate the unknown joint probability density function p(x, y). Locally unsupervised synapses associate signals with signals. This leads to conjunctive, or correlation, learning laws constrained by locality. In the simplest case the signal product S;(x;)5;(y;) drives the learning equation. Neuroscientist Donald Hebb [1949] first developed the mechanistic theory of conjunctive synaptic leaming in his book The Organization of Behavior. Scanned with CamScanner 138 SYNAPTIC DYNAMICS |: UNSUPERVISED LEARNING — Hap. 5 SIGNAL HEBBIAN LEARNING ‘We can solve the deterministic first-order signal Hebbian leaming law, mig = —miy(t) + Si(x;(t))S;(y;(4)) (4-133 to yield the integral equation f mult) = mylOer'+ [’S(s)5;(ole as (4133) The integral remains in the solution because in general x; and y; through the activation differential equations discussed in Chapter 3. The bounded signal functions $, and S, produce a bounded integral in (4-133 The unbounded activation product x.y), can produce an unbounded integral, depend on mij ). found in early models of Hebbian learning, Recency Effects and Forgetting Hebbian synapses leam an exponentially weighted average of sampled pattems, The solution to an inhomogenous first-order ordinary differential equation dictates the average structure and the exponential weight. This holds for all four types of learning we have examined. The exponential weight in the solution (4-133) induces a recency effect on the unsupervised leaming, We learn and forget equally fast. Every day, and every night, Wwe experience an exponential decrease in the information we retain from our day's experience, This well-known recency effect underlies the quote by philosopher David Hume at the beginning of this chapter. In this epistemological sense we temember more vividly our sensory experiences in the last twenty minutes than anything we have experienced in our past. Nothing is more vivid than now. The exponential weight «°' on the prior synaptic “knowledge” mi(0) in (4-133) arises from the forgetting term —m,, in the differential equation (4-132). Indeed the forgetting law provides the simplest local unsupervised learning law: ting = —miy (4-134) The forgetting law illustrates the two key properties of biologically motivated un- Supervised learning laws. It depends on only local information, the current synaptic strength m,,(t). And it equilibrates exponentially quickly, which allows real-time operation. Asymptotic Correlation Encoding Equation (4-133) implies that the synaptic matrix M_ of long-term memory (races m,, asymptotically approaches the bipolar correlation matrix XTY; discussed in Chapter 3 M = Xx/y, (4-135) ‘Scanned with CamScanner SIGNAL HER IN SEARNING 139 Equation (4-135) vectors X and Y denote the ny abbeviated notation from Chapter 3, Here state Values S; and S; equal | or — 1 SBMAl vectors S(xx) and S{y), and the signal Suppose the signal functio 5,5; is also bipolar: ‘ons S; and S; are bipolar. Then the signal product al 66. : = ei 1 (4-136) 1 i : Te ae a iff one signal equals 1 and the other equals -1. The pais a equivalence — Signals equal 1 or —1. So the signal product 5:5; Consider the Siem a multivalued logic, as discussed in Chapter |- —e first-order linear ordinary differential ae de (4-132) reduces to the simple tagtmy = 1 (4-137) This differential equation has solution s 7 ring(t) = mg(O)e*+ f et ds (4-138) o m(O)et +1 —et (4-139) = 1+[ms(0) - te* (4-140) So m(t) + 1 as t— 00 (4-141) for any finite initial synaptic value m,;(0). Similarly in the other extreme, when S;5; = —1, mij(t) converges to —1 ex- ponentially quickly. So the signal Hebbian learning law approaches asymptotically, and exponentially quickly, the bipolar outer-product matrix (4-135) discussed in Chapter 3. This asymptotic analysis assumes we present only one pattern to the learning system for an infinite length of time. The pattern washes away all prior leaned pattern information mi; (0) exponentially quickly. In general, unsupervised synapses leam on a single pass of noisy data. They sample a training set representative of some continuous pattern cluster. Each sample tends to resemble the preceding and succeeding samples. In practice discrete signal Hebbian learning requires that either we cycle through a training set, presenting each associative pattern sample (S(xx), S(y«)) for a minimal period of learning, or we present the more recent of the m associative patterns for shorter periods of learning than we presented previous patterns. Math- ematically this means we must use a diagonal fading-memory exponential matrix W, as discussed in Chapter 3: xtwy = SowX?¥s (4-142) k=l ‘Scanned with CamScanner rau SYNAPTIC DYNAMICS I: UNSUPERVISED LeArisis wr The weighted outer-product matrix K7WY compensates foe polar signal nential decay of leamed information. K and Y denote matric! vectors. ‘Scanned with CamScanner COMPETITIVE LEARNING The deterministic competitive learning law tag = SiS; — mig] 165) resembles the signal Hebbian learning law (4-132). If we distribute the competitive signal S; in (4-165), this = —Symiz + SS; (4-166) we see that the competitive learning law uses the nonlinear forgetting term —Sjmjj. The signal Hebbian learning law uses the linear forgetting term —mi;. ’ So competitive and signal Hebbian learning differ in how they forget, not in how they learn. In both cases when S; = 1—when the jth competing neuron wins— the synaptic value m;; encodes the forcing signal S; and encodes it exponentially quickly. But, unlike a Hebbian synapse, a competitive synapse does not forget when its postsynaptic neuron loses, when S; = 0. Then (4-165) reduces to the no-change relationship my = 0 (4-167) while the signal Hebbian law (4-132) reduces to the forgetting law (4-134). Signal Hebbian learning is distributed. Hebbian synapses encode pieces of every sampled pattern. They also forget pieces of every leamed pattern as they learn fresh patterns. In competitive learning, only the winning synaptic vector (mj, .-., Myj) en- codes the presently sampled pattern S(x) or x. If the sample patterns S(x) or x Persist long enough, competitive synapses behave as “grandmother” synapses. The ‘Scanned with CamScanner SYNAPTIC DYNAMICS |: UNSUPERVISED LEARNING CHAP. 4 146 | i i in fic value mij SOON equals the pattern piece S;(a;) or 2i- No other synapse syna] ieee ect. : the network encodes this Pete vempettve eaming is not disbed, We can omit learning metaphor and directly copy the sample pattem x cade apprope a Ede min| px — mal? ee where the squared Euclidean or /? vector norm |jz/|? equals the sum of squared entries: QP = 2f+---422 (4-172) = ga" Then competitive learning reduces to classical correlation detection of signals [Cooper, 1986]—provided all synaptic vectors m; have approximately the same norm value ||m,||. Kohonen [1988] has suggested that synaptic vectors are asymp- totically equinorm. The centroid argument below shows, though, that the equinorm property implicitly includes an equiprobable synaptic allocation or vector quantiza- tion of R” as well as other assumptions. We need the equinorm property along the way to asymptotic convergence as well as after the synaptic vectors converge. In practice competitive learning systems tend to exhibit the equiprobable quan- tization and equinorm properties after training. Hecht-Nielsen [1987] and others simply require that all input vectors x and synaptic vectors mj be normalized to points on the unit R" sphere: ||x!|? = ||m,]|? = 1. Such normalization requires that we renormalize the winning synaptic vector my at each iteration in the stochastic- difference algorithm (4-124). To see how metrical competitive leaming reduces to correlation detection, suppose at each moment the synaptic vectors m, have the same positive and finite norm value: [my|P= = [bmp (4-173) Our argument will not require that the norms in (4-173) equal the same constant at every instant, though they often do. From (4-171) the jth competing neuron wins the competition if and only if [be m,| = min|}x — mx||? (4-174) iff (2 — my)(¢—mj)/ = minx ~ m,)(« — ma)” (4-175) iff xx" +mjm7 -2xm; = min(sex? +mymf} -2xmf) — (4-176) WW iff [Jm,||? — 2xm7° min( mg? ~ 2xmf) (4-177) ‘Scanned with CamScanner 148 SYNAPTIC DYNAMICS |: UNSUPERVISED LEARNING CHAP i We now subtract ||m,||? from both sides of (4-177) and invoke the equinorm ropeny (4-173) to get the equivalent equality i T -2xmf = 2 min xm; 119 ‘The DeMorgan’s law max(a, 6) =~ min(—a, ~b) further simplifies (4-178) to pro, duce the correlation-detection equality xmjy = max xm, (4-179) The jth competing neuron wins iff the input signal or pattern x correlates maximally with m,. We minimize Euclidean distance when we maximize correlations or inner products, and conversely, if the equinorm property (4-173) holds. From a neural: network perspective, this argument reduces neuronal competition to a winner-take-all linear additive activation model. Correlations favor optical implementation. Optical systems cannot easily com- pute differences. Instead they excel at computing large-scale parallel multiplications and additions. Small-scale software or hardware implementations can more com- fortably afford the added accuracy achieved by using the metrical classification rule (4-170). The cosine law xm? = _||x|]{}m,l| cos(x, mj) (4-180) provides a geometric interpretation of metrical competitive learning when the equinorm property (4-173) holds. The term cos(x, mj) denotes the cosine of the angle between the two R” vectors x and mj. The cosine taw implies that the jth neuron wins iff the input pattern x is more parallel to synaptic vector mj than to any other m,. Parallel vectors maximize the cosine function and yield cos(x, mj) = I. Suppose all pattern vectors x and synaptic vectors mj lie on the R” unit sphere S" = {z €-R": ||z||? = 1}. So ||x||? = |Imj||? = 1. Then the cosine law confirms the intuition that the jth neuron should maximally win the competition if the current sample x equals mj, and should maximally lose if x equals -™) and thus lies on the opposite side of S". All other samples x on S” produce win intensities between these extremes. ‘Scanned with CamScanner 152 SYNAPTIC DYNAMICS I: UNSUPERVISED LEANNING CHap ; DIFFERENTIAL HEBBIAN LEARNING The deterministic differential Hebbian leaming law thy = — mi + 5:8; + 5:5; any and its simpler version tng = — mis + SiS; any arose from attempts to dynamically estimate the causal structure of fuzzy Cognitive maps from sample data [Kosko, 1985, 1986, 1987(c)]. This led to abstract Tathe, matical analyses, which in tum led to neural interpretations of these Signal-velocin learning laws, as we shall see. Intuitively, Hebbian correlations Promote spyyj, ous causal associations among concurrently active units. Differential Correlations estimate the concurrent, and presumably causal, variation among active units, Fuzzy Cognitive Maps Fuzzy cognitive maps (FCMs) are fuzzy signed directed graphs with feed. back. The directed edge e;; from causal concept C; to concept C; measures how much C; causes C;. The time-varying concept function C;{t) measures the Tonneg. ative occurrence of some fuzzy event, perhaps the strength of a political sentiment, historical trend, or military objective. FCMs model the world as a collection of classes and causal relations between classes. Taber [1987, 1991] has used FCMs to model gastric-appetite behavior and popular political developments. Styblinski [1988] has used FCMs to analyze elec- trical circuits. Zhang [1988] has used FCMs to analyze and extend graph-theoretic behavior. Gotoh and Murakami [1989] have used FCMs to model plant control. The edges e;; take values in the fuzzy causal interval [-1, 1]. e); = 0 indicates No causality. e;; > 0 indicates causal increasé: C; increases as C; increases, and C; decreases as C; decreases. ¢;; < 0 indicates causal decrease or negative causality: C; decreases as C; increases, and C; increases as C; decreases. For instance, free: way congestion and bad weather increase automobile accidents on freeways. Auto accidents decrease the patroling frequency of highway patrol officers. Increased highway-patrol frequency decreases average driving speed. Simple FCMs have edge values in {-1, 0, 1}. Then, if causality occurs, it occurs to maximal positive or negative degree, Simple FCMs provide a quick fis! approximation to an expert’s stated or printed causal knowledge. For instance, # syndicated article on South African politics by political economist Walter Williams [1986] led to the simple FCM depicted in Figure 4.2. the pire ie abound in FCMs in thick tangles. Feedback precludes a feel 1985, Ae tas in artificial-intelligence expert systems bea in cyclic knowledge Trainin algorithms tend to get stuck in infin Scanned with CamScanner

