Fuzzy Modeling
Fuzzy Modeling
Fuzzy Modeling
Introduction:
Classical approach : - Low accuracy in complicated systems - Systems for which first principle and theoretical methods are not fully developed -
Solution: 1- human parallel processing neural networks 2- human reasoning and inference system fuzzy models
2
-Although neural networks have many advantages but they have three main problems: 1- data saved in some parameter which are not interpretable 2- nonlinear optimization problem 3- capturing the expert knowledge is impossible
Fuzzy models
-A mathematical model which in some way uses fuzzy sets is called fuzzy model [1] -A method for modeling complex, ill defined, and less tractable systems. if( ) then ( )
Example(mamdani):
validity of rule
output of rule
fuzzy sets
-Two different ideas are behind these modeling approaches; while the Mamdani model tries to imitate the human reasoning mechanism, the Takagi-Sugeno model tries to represent system by some local simple models when it is not describable by a single model accurately. For this reason Takagi-Sugeno model is sometimes called local model.
LOLIMOT(Nelles)
CLUSTERING (Babuka)
ANFIS
(Adaptive-Network-Based Fuzzy Inference System)
Main problems of fuzzy modeling before ANFIS: 1) No standard methods exist for transforming human knowledge or experience into the rule base and database of a fuzzy inference system.
2) There is a need for effective methods for tuning the membership functions (MFs) so as to minimize the output error measure or maximize performance index.
neural networks
Neuron structure:
Output of neuron:
y k ( kj x j b k )
j 1
9
):
10
1 (1) & = 1 ()
(1) & ()
(1) ()
- Cost function 1 = 2
=1
&
12
-Optimization algorithm steepest descent: The search direction is the opposite gradient direction.
= : the gradient of output error with respect to - The most important advantage of this algorithm is that it shows that the gradient for each weight can be calculated with the aid of the gradient of neurons in the next layer.
13
-Training procedure:
Its two pass optimization method. In forward pass the inputs go through the MLP and and = can be calculated. It backward pass the error goes through output layer to input layer and update all of the MLPs weights. This procedure repeated by all data samples many time.
14
15
1-Compare the input variables with the membership functions on the premise part to obtain the membership values (or compatibility measures) of each linguistic label. (This step is often called fuzzification ). 2- Combine (through a specific T-norm operator, usually multiplication or min.) the membership values on the premise part to get firing strength (weight) of each rule. 3- Generate the qualified consequent (either fuzzy or crisp) of each rule depending on the firing strength.
4- Aggregate the qualified consequents to produce a crisp output. (This step is called defuzzification.)
16
- Example
1 1 = + + () 2 2 2 2 = + + () 1 1
Type 1
Mamdani
Type2
TSK
17
& &
18
Example of a FIS with two input and three membership function for each of the inputs
19
Training procedure:
two passes in the hybrid learning procedure for ANFIS
Forward pass Premise parameters Consequent parameters signals Fixed Least square estimate Node output Backward pass Gradient descent Fixed Error rates
20
Why we can use the least squares algorithm for consequent parameters: (for example for TSK model on page 18) 1 2 = 1 + 2 1 + 2 1 + 2 = 1 1 + 2 2 = (1 ) + 1 + 1 + (2 ) + 1 + 1 Linear regression problem =
21
= (1 )
1 (2 ) 1
- In backward pass the gradient descent algorithm is used to optimize the premise parameter while the error propagate backward through the network.(like back propagation in neural networks)
22
Remark1: since the consequent parameters are optimized in each iteration with least squares algorithm, in backward pass the nonlinear optimization problem can be solved more efficiently and problems such as being trapped in local minima or slow convergence are less problematic.
23
- remark2: TSK model is more popular in ANIS structure since it has more adjustable parameters in consequent of rules. This will reduce the training time and effort, because these parameters will be linear with respect to output error and can be estimated very efficiently through least-squares algorithm
24
- Remark3: sometimes optimizing the premise parameter (input membership functions) will deteriorate the interpretability of the rule base.
25
Example:
= 0.6 sin + 0.3 sin 3 + 0.1 sin 5 & = [1,1]
26
27
Loss of interpretability
28
FUREGA
Fuzzy Rule Extraction using Genetic Algorithm
29
FUREGA:
1- start a grid base network using prior knowledge 2- selection of rule by genetic algorithm 3-least squares for output parameter optimization 4- constrain nonlinear optimization of membership function
30
Properties :
Hopeful to have the best solution (accuracy) Time consuming training
Curse of dimensionality
Interpretability ?
31
LOLIMOT
Local Linear Model Tree
32
33
Example:
34
LOLIMOT algorithm:
-The algorithm has an outer loop (upper level) that determines the input partitions (structure) where the local linear models are valid and an inner loop (lower level) that estimates the parameters of those local linear models by efficient weighted least squares algorithm.
Consequent parameter estimation:
=1
0 + 1 1 + + . (, , )
: inputs vector
: normalized Gaussian weighting function for the ith model with center coordinates and standard deviations
35
, , =
Where: 1 1 1 = exp( ( 2 2 1
2
=1
2
2 2 2 2
+ +
)) 2
determined. Then the parameters of each linear model are estimated separately by a weighted least squares technique. With the data matrix X (inputs of model-known) the diagonal weighting matrix Q, (each entry is the weighting function value of the corresponding input data) and desired outputs y the optimal parameters of the model are: = 1
36
- Input space partitioning 1- Set the first hyper-rectangle in such a way that is contains all data points. Estimate a global linear model. 2- For all input dimensions j := l...n: 2a. Cut the hyper-rectangle into two halves along dimension j. 2b. Estimate local linear models for each half. 2c. Calculate the global approximation error (output error) for the model with this cut. 3- Determine which cut has led to the smallest approximation error.
37
4- Perform this cut. Place a weighting function within each center of both hyper-rectangles. Set standard deviations of both weighting functions proportional to the extension of the hyper-rectangle in each dimension. Apply the corresponding estimated local linear models(from 2b). 5- Calculate the local error measures J on basis of a parallel running model for each hyper-rectangle.
LOLIMOT
39
Example:
40
properties:
High interpretability of rules
structure optimization
Low sensitivity to user selected parameters No curse of dimensionality for high-dimensional
problems
41
Implementing Hierarchical Fuzzy Clustering in Fuzzy Identification Using weighted fuzzy C-means
42
Clustering
- Definition
to divide the data-set in such way that objects belonging to the same cluster are as similar as possible and objects belonging to different clusters are as dissimilar as possible - types 1- Crisp 2- Fuzzy - Properties 1-Unsupervised learning task 2- Nonlinear optimization 3- Computational economy 4- Needs user defined parameters
43
m ---> 1
44
45
46
47
SOM algorithm:
1- Choose initial values for the C neuron vectors , = 1, . . . , . This can be done by picking randomly C different data samples. 2. Choose one sample for the data set(u). This can be done either randomly or by systematically going through the hole data set. 3. Calculate the distance of the selected data sample to all neuron vectors. Typically, the Euclidean distance measure is used. The neuron with the vector closest to the data sample is called the winner neuron. 4. Update the vector of the winner neuron in a way that moves it toward the selected data sample u: = + ( ) 5. If any neuron vector has been moved significantly, in the previous step then go to Step 2; otherwise stop.
48
data.
Input space term-sets derived from a direct result of the clustering
process
Computational economy
49
50
Algorithm:
52
Algorithm:
1- apply SOM algorithm to classify N data samples into n crisp clusters( , = 1. . ). 2- select the n cluster center( , = 1. . ) from previous step and assign a weight for each of them according to their relative cardinality. = 3-apply WFCM to classify the n cluster center ( , ) into C new clusters.
53
4- The centers of the Gaussian membership functions in premise 0f the fuzzy rules are obtained by simply projecting the final cluster centers into each axis. To calculate the respective standard deviations utilize the fuzzy covariance matrix.[5]
5- use weighted least squares to optimize the consequent parameters and steepest descent for premise parameters.(Formulas[5]) 6- merge similar member functions for interpretability. similarity measure: , =
Example I:
55
Example :
5 4
DS1\11
SOM
1 1 2 3 DS1\10 4 5
5.5
WFCM
4.5
3.5
X2
2.5 1.5 1 2
X1
56
Example (cont.):
initial term-sets for x1
1 0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1
J=0.1801
J=0.0018
1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
0.5
1.5
2.5
3.5
4.5
medium
0.5
large J=0.0154
0.5
large small
small
0 0 1 1.5 2 2.5 3 3.5 4 4.5 5 1
1.5
2.5
3.5
4.5
R1: if x1 is small and x2 is small then y=17.3-2.6x1+1.4x2 R2: if x1 is medium and x2 is large then y=7.5-2.9x1-0.02x2 R3: if x1 is large and x2 is small then y=4.7+2.7x1-7.8x2 R4: if x1 is large and x2 is large then y=2.8-0.2x1-0.2x2
57
Example II:
output x(t+6)
50
100
150
200
250
300
350
400
450
500
Example II(cont.):
initial term-sets for x(t-18)
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0.5
0.5
0 0.4
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
Example II(cont.):
initial term-sets for x(t-6)
1 0.8 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
J=0.0166
J=0.0072
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
J=0.0128
0.5
0.5
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
0 0.4
0.5
0.6
0.7
0.8
0.9
1.1
1.2
1.3
Computational economy
curse of dimensionality interpretability
61
universal approximator
62
Proof:[6]
63
References:
1- Babuka, R. and Verbuggen, H. (2003). Neuro-fuzzy methods for nonlinear system identification, Review. Annual reviews in control, 27, 73-85. 2- Haykin, S.(1998), Neural Networks: A Comprehensive Foundation. Prentice Hall. 4- Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man & Cybernetics, 23(3), 665685. 3- Nelles, O. and Isermann, R. (1996). Basis function networks for interpolation of local linear models. In: IEEE Conference on Decision and Control (CDC), 470475. 4- Nelles, O. (2002). Nonlinear System Identification. Springer Verlag, Berlin. 5- Oliveira, J. V. and Pedrycz, W. (2007). Advances in Fuzzy Clustering and its Applications, John Wiley & Sons, chapter 12. 6- Espinosa, J., Vandewalle, J., Wertz, V. (2004). Fuzzy logic, identification and predictive control. Springer Verlag, Berlin.
64
65