Ch22 Presn PDF
Ch22 Presn PDF
Ch22 Presn PDF
1
22 Network Training Steps
Initialize Weights
Analyze Network &
Performance Train Network
Use Network
2
22 Selection of Data
3
22 Determination of Input Range
4
22 Data Preprocessing
Nonlinear transformations.
pt = 1 / p p t = exp(p )
6
22 Importance of Transfer Function
7
22 Softmax Transfer Function
exp (n )
ai = f (n i ) = -
----
-----------------i------
S
exp ( n j)
j=1
8
22 Missing Data
9
22 Problem Types
Fitting (nonlinear regression). Map between a set of inputs and a
corresponding set of targets. (e.g., estimate home prices from tax rate,
pupil/teacher ratio, etc.; estimate emission levels from fuel
consumption and speed; predict body fat level from body
measurements.)
Pattern recognition (classification). Classify inputs into a set of target
categories. (e.g., recognize the vineyard from a chemical analysis of
the wine; classify a tumor as benign or malignant, from uniformity of
cell size, clump thickness and mitosis.)
Clustering (segmentation) Group data by similarity. (e.g., group
customers according to buying patterns, group genes with related
expression patterns.)
Prediction (time series analysis, system identification, filtering or
dynamic modeling). Predict the future value of some time series. (e.g.,
predict the future value of some stock; predict the future value of the
concentration of some chemical; predict outages on the electric grid.)
10
22 Choice of Network Architecture
Fitting
Multilayer networks with sigmoid hidden layers and linear output layers.
Radial basis networks
Pattern Recognition
Multilayer networks with sigmoid hidden layers and sigmoid output
layers.
Radial basis networks.
Clustering
Self-organizing feature map
Prediction
Focused time-delay neural network
NARX network
11
22 Prediction Networks
Inputs Layer 1 Layer 2
1
p (t)
T a1(t) a2(t)
Focused D
L
IW1,1
1 1
n1(t)
1
S x1
LW2,1
2 1
2
n (t) S2x1
S x (R d) S xS
Time Delay
2
S x1
1 b
1 1
S x1 1 b2
R1 S1x1 S1 S2
2
S x1
1 2
S1
2
R S1x1 S x1 S
T
D LW1,3
L
12
22 Architecture Specifics
Number of layers/neurons
For multilayer network, start with two layers. Increase number of layers if
result is not satisfactory.
Use a reasonably large number of neurons in the hidden layer (20). Use
early stopping or Bayesian regularization to prevent overfitting.
Number of neurons in output layer = number of targets. You can use
multiple networks instead of multiple outputs.
Input selection
Sensitivity analysis (see later slide)
Bayesian regularization with separate for each column of the input
weight matrix.
13
22 Weight Initialization
i w = 0 .7 ( )
S 1 1/ R
iw and iw .
14
22 Weight Initialization
15
22 Choice of Training Algorithm
16
22 Stopping Criteria
Norm of the gradient (of the mean squared error) less than
a pre-specified amount (for example, 10-6).
Early stopping because the validation error increases.
Maximum number of iterations reached.
Mean square error drops below a specified threshold (not
generally a useful method).
Mean square error curve (on a log-log scale) becomes flat
for some time (user stop).
17
22 Typical Training Curve
3
10
2
10
Sum Square Error
1
10
0
10
1
10
2
10
3
10
0 1 2 3
10 10 10 10
Iteration Number
18
22 Competitive Network Stopping Criteria
19
22 Choice of Performance Function
Q
1 T
Mean Square Error F ( x) = -----
-
---
--
(t q a q ) (t q a q )
M
QS
q=1
M
Q S
1 2
-
----
------
F (x ) =
QS
M ( ti, q a i, q )
q = 1i = 1
M
Q S
1 K
Minkowski error F (x ) = ----M
-
---- --
t i , q a i, q
QS
q = 1i = 1
M
Q S
a i, q
Cross-Entropy F ( x) = t i, q --t--i--, --q-
ln
q = 1i = 1
20
22 Committees of Networks
21
22 Post-Training Analysis
Fitting
Pattern Recognition
Clustering
Prediction
22
22 Fitting
Regression Analysis (Outputs vs Targets)
Q Q
( t q t ) (a q a ) t = tq
a q = mtq + c + q q= 1
m = --------------
Q
--------------------------------- q=1
2 Q
c = am t ( tq t ) a = aq
q= 1
q=1
R Value (-1<R<1) Q
Q 1
st = -------------
Q1
( tq t)
( t q t ) (a q a ) q=1
q= 1
R = ------------------------1-----------------------
( Q ) s t sa Q
1
sa = -------------
Q 1
( aq a )
q= 1
23
22 Sample Regression Plot
50
R = 0.965
40
R2 = 0.931
30
a
20
10
0
0 5 10 15 20 25 30 35 40 45 50
t
24
22 Error Histogram
140
120
100
80
60
40
20
0
8 6 4 2 0 2 4 6 8 10 12
e=t-a
25
22 Pattern Recognition
False Positives
Confusion Matrix
(Type I Error)
47 1 97.9%
1
Output Class 22.0% 0.5% 2.1%
4 162 97.6%
2
1.9% 75.7% 2.4%
False Negatives 1 2
(Type II Error) Target Class
26
22 Pattern Recognition
Receiver Operating Characteristic (ROC) Curve
0.9
0.8
0.7
True Positive Rate
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
False Positive Rate
27
22 Clustering (SOM)
2
iw jw
hi,j = neighborhood function h ij = exp ------------------
----------
2d 2
28
22 Prediction
Autocorrelation Function of Prediction Errors.
Q
1
--
---
-------
R e ( ) =
Q e (t ) e ( t + )
t=1
Confidence Intervals.
2R ( 0) 2 R e (0 )
-
----
----
e--------
< R e( ) < ----------------
-
Q Q
0.2 0.03
Before After
0.02
0.1
0.01
0.1
40 20 0 20 40 40 20 0 20 40
29
22 Prediction
Cross-correlation Between Prediction Errors and Input.
Q
1
R pe ( ) = ------------ p ( t ) e( t + )
Q
t= 1
Confidence Intervals.
2 R ( 0 ) R (0 ) 2 R ( 0 ) R (0 )
---------------------------------------- < R pe ( ) < ----------------------------------------
e p e p
Q Q
0.2 0.02
Before After
0.01
0.1
0.01
0.1 0.02
40 20 0 20 40 40 20 0 20 40
30
22 Overfitting and Extrapolation
31
22 Diagnosing Problems
33
22 Sensitivity Analysis
Check for important inputs.
m F
s i ---------
m
n i
1 1
S S
ni
1
ni
1
----F
--- = -----F---
-------- 1
--------
pj n i
1 pj
= si
pj
i=1 i =1
R
1 1 1
ni = wi, j p j + b i
j=1
1 1
S S
ni
1
--F--
--- F--
------ -------- 1 1
pj
= n i
1 p
j
= s i wi, j
i=1 i= 1
---F
T
--- = ( W1 ) s1
p
34