Artificial Neural Networks in Bi: Information System Dept ITS Surabaya 2009
Artificial Neural Networks in Bi: Information System Dept ITS Surabaya 2009
Artificial Neural Networks in Bi: Information System Dept ITS Surabaya 2009
NETWORKS IN BI
Information System Dept
ITS Surabaya
2009
Outline
1. Introduction
2. ANN Representations & Architecture
3. Learning in ANN
4. Back-Propagation Algorithm
5. Remarks on the BP algorithm
6. ANN Application Development
7. Benefits and Limitations of ANN
8. ANN Applications
1. Introduction: Biological Motivation
Human brain is a densely interconnected
network of
approximately 10
11
neurons, each connected to, on
average, 10
4
others.
Neuron activity is excited or inhibited through
connections to other neurons.
The fastest neuron switching times are known to
be on the order of 10
-3
sec.
The cell itself
includes a nucleus
(at the center).
To the right of cell 2,
the dendrites provide
input signals to the
cell.
To the right of cell 1,
the axon sends
output signals to cell
2 via the axon
terminals. These
axon terminals
merge with the
dendrites of cell 2.
Portion of a Network: Two Interconnected
Cells
Signals can be transmitted unchanged or they can be
altered by synapses.
A synapse is able to increase or decrease the strength of the
connection from the neuron to neuron and cause excitation or
inhibition of a subsequence neuron.
This is where information is stored.
The information processing abilities of biological neural
systems must follow from highly parallel processes
operating on representations that are distributed over
many neurons. One motivation for ANN is to capture this
kind of highly parallel computation based on distributed
representations.
2. Neural Network Representation &
Architecture
6
An ANN is composed of processing elements
Processing Elements
An ANN consists of inputs, processes inputs and delivers a single
output.
The input can be raw
input data or the output
of other perceptrons.
The output can be the
final result (e.g. 1 means
yes, 0 means no) or it
can be inputs to other
perceptrons.
Biological vs. Artificial NN
Biological Artificial
Soma: Nucleus Node
Dendrites Input
Axon Output
Synapse Weight
Slow Speed Fast Speed
Many neurons (10
9
) Few neurons (a dozen to hundreds of
thousands)
The Network
Each ANN is composed of a collection of processing
elements (PE) grouped in layers.
Note the three layers:
1. input
2. intermediate (called
the hidden layer)
3. output.
Several hidden layers
can be placed between
the input and output
layers.
Appropriate Problems for Neural
Network
ANN learning is well-suited to problems
The training data corresponds to noisy
Complex sensor data
Problems where symbolic representations are used.
The back-propagation (BP) algorithm is the most commonly used
ANN learning technique. It is appropriate for problems with the
characteristics:
Input is high-dimensional discrete or real-valued (e.g. raw sensor input)
Output is discrete or real valued
Output is a vector of values
Possibly noisy data
Long training times accepted
Fast evaluation of the learned function required.
Not important for humans to understand the weights
Examples:
Speech phoneme recognition
Image classification
Financial prediction
Information Processing in ANN
10
Input
One input node corresponds to a single attribute
e.g. for loan application, some applicant attributes can be age,
income level, and home ownership
Some available data need to be preprocessed into meaningful
inputted data with symbolic representation or scaling e.g. transform
non numerical attributes into numerical attributes
Output
Solution to the problem
E.g. in load application, the output can be yes or no with numeric
values
Need to be post-processed. For instance, rounding output values
into 0 or 1
Connection Weights
Key elements of ANN
Express relative strength (importance) of input data or many
connections that transfer data from layer to layer
Weights store learned patterns of information through repeated
adjustments of weights
Information Processing in ANN (Cont.)
11
Summation function
Computes the weighted sums of all the input elements
entering each PE
Formula:
Transformation (transfer) function
In form of activation level function
In relation with output, it can be linear or non-linear
function
Example of non-linear function
Sigmoid function: S-shape function in the range of 0 to 1
Formula:
Normalize output values into reasonable values,
sometimes using threshold value
n
i
i i
W X Y
1
) 1 (
1
) (
Y
e
Y
Hidden Layer
12
Complex practical applications require one or
more hidden layer between input and output
Typical number of hidden layer 3~5, each layer
contains 10~10,000 PE
Adding a hidden layer, increases the training
effort exponentially and computational
requirements
Architecture of ANN
13
3. Learning in ANN
14
Supervised Learning
Outputs of training samples are known
e.g. a historical data set of loan applications with the
success or failure of individual to repay has a set of input
parameters and known outputs
Desired output and actual output difference is used to
calculate weights update of ANN
Algorithm: back-propagation (BP)
Unsupervised Learning
Outputs are unknown
The network is self organizing
Human needs to examine the final categories/results and
assigning meaningful description
Example of network: Kohonen network
Supervised Learning Algorithm
Iterative Gradient Descent Technique
Minimize error function between actual output (Y) and
desired output (Z) given input data
Weight adjustment starts at output node and propagated
all the way back toward input layer to reduce the delta
Linear function case has to be linearly separable!
delta = Zj Yj
Wi(final) = Wi(initial) + alpha x delta x Xi
Alpha is learning rate that control the learning speed
Too large alpha will lead to much correction results in going back
& forth among possible weights values, never reaching the
optimal point
Too small alpha may slow down the learning process
Compute
output
Adjust
Weight
Start
Stop
Is
desired
output
achieve?
No
2
) (
2
1
) (
D d
d d
Y Z W E
Yes
Gradient-Descent Function
Minimization
?
Example of Supervised Learning
OR Case
Case X1 X2 Desired
Results
1 0 0 0
2 0 1 1
3 1 0 1
4 1 1 1
Example of Supervised Learning (Cont.)
Using aplha=0.2, threshold=0.5
Step X1 X2 Z W1 W2 Y Delta W1 W2
1 0 0 0 0.1 0.3 0 0 0.1 0.3
0 1 1 0.1 0.3 0 1 0.1 0.5
1 0 1 0.1 0.5 0 1 0.3 0.5
1 1 1 0.3 0.5 1 0 0.3 0.5
2 0 0 0 0.3 0.5 0 0 0.3 0.5
0 1 1 0.3 0.5 0 1 0.3 0.7
1 0 1 0.3 0.7 0 1 0.5 0.7
1 1 1 0.5 0.7 1 0 0.5 0.7
3 0 0 0 0.5 0.7 0 0 0.5 0.7
0 1 1 0.5 0.7 1 0 0.5 0.7
1 0 1 0.5 0.7 0 1 0.7 0.7
1 1 1 0.7 0.7 1 0 0.7 0.7
4 0 0 0 0.7 0.7 0 0 0.7 0.7
0 1 1 0.7 0.7 1 0 0.7 0.7
1 0 1 0.7 0.7 1 0 0.7 0.7
1 1 1 0.7 0.7 1 0 0.7 0.7
4. Back-Propagation
Algorithm
19
Iterative Gradient Descent using non-linear function
(linearly inseparable)
Minimizing by differentiating the error function which
includes the non-linear function for weight update
delta (error) = (Zj Yj)(df/dx)
Non linear function e.g. sigmoid function, f=[1+exp(-W
i
X
i
)]
-1
df/dx=f(1-f) which is called logistic function to keep the error
correction bounded
Wi(final) = Wi(initial) + alpha x (Zj Yj) x f(1-f) x Xi
Steps
Initialize weights with random values and set other
parameter
Read in the input vector & the desired output
Compute the actual output by working forward through the
layers
Computer the error
Change the weights by working backward from the output
layer through the hidden layers
The Sigmoid Threshold Unit
20
The Sigmoid Threshold Unit (Cont.)
21
The sigmoid function (x) is also called the logistic function.
Interesting property:
Output ranges between 0 and 1, increasing monotonically with its
input.
5. Remarks on BP Algorithm
22
Convergence and Local Minima
Gradient descent to some local minimum
Perhaps not global minimum...
Heuristics to alleviate the problem of local minima
Add momentum
Use stochastic gradient descent rather than true
gradient descent.
Train multiple nets with different initial weights using
the same data.
6. ANN Application Development
23
The development process for an ANN application has eight steps.
Step 1: (Data collection) The data to be used for the training and
testing of the network are collected.
Important considerations are that the particular problem is amenable to
neural network solution and that adequate data exist and can be
obtained.
Step 2: (Training and testing data separation) Training data must
be identified, and a plan must be made for testing the
performance of the network.
The available data are divided into training and testing data sets.
For a moderately sized data set, 80% of the data are randomly selected
for training, 10% for testing, and 10% secondary testing.
Step 3: (Network architecture) A network architecture and a
learning method are selected.
Important considerations are the exact number of PE and the number of
layers.
ANN Application Development
(Cont.)
24
Step 4: (Parameter tuning and weight initialization) There
are parameters for tuning the network to the desired
learning performance level.
Part of this step is initialization of the network weights and
parameters, followed by modification of the parameters as
training performance feedback is received.
Often, the initial values are important in determining the
effectiveness and length of training.
Step 5: (Data transformation) Transforms the application
data into the type and format required by the ANN.
Step 6: (Training) Training is conducted iteratively by
presenting input and desired or known output data to the
ANN.
The ANN computes the outputs and adjusts the weights until
the computed outputs are within an acceptable tolerance of
the known outputs for the input cases.
ANN Application Development
(Cont.)
25
Step 7: (Testing) Once the training has been completed, it
is necessary to test the network.
The testing examines the performance of the network using the
derived weights by measuring the ability of the network to classify
the testing data correctly.
Black-box testing (comparing test results to historical results) is the
primary approach for verifying that inputs produce the appropriate
outputs.
Step 8: (Implementation) Now a stable set of weights are
obtained.
Now the network can reproduce the desired output given inputs like
those in the training set.
The network is ready to use as a stand-alone system or as part of
another software system where new input data will be presented to it
and its output will be a recommended decision.
7. Benefit & Shortcoming of ANN
26
Benefits of ANNs
Usefulness for pattern recognition, classification, generalization,
abstraction and interpretation of imcomplete and noisy inputs.
(e.g. handwriting recognition, image recognition, voice and
speech recognition, weather forecasing).
Providing some human characteristics to problem solving that
are difficult to simulate using the logical, analytical techniques of
expert systems and standard software technologies. (e.g.
financial applications).
Ability to solve new kinds of problems. ANNs are particularly
effective at solving problems whose solutions are difficult, if not
impossible, to define. This opened up a new range of decision
support applications formerly either difficult or impossible to
computerize.
Benefit & Shortcoming of ANN
(Cont.)
27
Robustness. ANNs tend to be more robust than their
conventional counterparts. They have the ability to cope with
incomplete or fuzzy data. ANNs can be very tolerant of faults if
properly implemented.
Fast processing speed. Because they consist of a large number
of massively interconnected processing units, all operating in
parallel on the same problem, ANNs can potentially operate at
considerable speed (when implemented on parallel processors).
Flexibility and ease of maintenance. ANNs are very flexible in
adapting their behavior to new and changing environments. They
are also easier to maintain, with some having the ability to learn
from experience to improve their own performance.
ANN Shortcoming
ANNs do not produce an explicit model even though new cases can
be fed into it and new results obtained.
ANNs lack explanation capabilities. Justifications for results is
difficult to obtain because the connection weights usually do not
have obvious interpretations.
8. SOME ANN APPLICATIONS
28
ANN application areas:
Tax form processing to identify tax fraud
Enhancing auditing by finding irregularities
Bankruptcy prediction
Customer credit scoring
Loan approvals
Credit card approval and fraud detection
Financial prediction
Energy forecasting
Computer access security (intrusion detection and classification
of attacks)
Fraud detection in mobile telecommunication networks
Customer Loan Approval
29
Problem Statement
Many stores are now offering their customers the possibility of applying for a
loan directly at the store, so that they can proceed with the purchase of
relatively expensive items without having to put up the entire capital all at
once.
Initially this practice of offering consumer loans was found only in connection
with expensive purchases, such as cars, but it is now commonly offered at
major department stores for purchases of washing machines, televisions,
and other consumer goods.
The loan applications are filled out at the store and the consumer deals only
with the store clerks for the entire process. The store, however, relies on a
financial company (often a bank) that handles such loans, evaluates the
applications, provides the funds, and handles the credit recovery process
when a client defaults on the repayment schedule.
For this study, there were 1000 records of consumer loan applications
that were granted by a bank, together with the indication
whether each loan had been always paid on schedule or there
had been any problem.
The provided data did not make a more detailed distinction about the
kind of problem encountered by those bad loans, which could range
from a single payment that arrived late to a complete defaulting on the
loan.
Customer Loan Approval (Cont.)
30
ANN Application to Loan Approval
Each application had 15 variables that included the number of
members of the household with an income, the amount of the loan
requested, whether or not the applicant had a phone in his/her house,
etc.
Table 1: Input and output variables
Input variables Variable values
-----------------------------------------------------------------------------------------------------------------
1 N of relatives from 1 to total components
2 N of relatives with job from 0 to total components
3 Telephone number 0,1
4 Real estate 0,1
5 Residence seniority from 0 to date of loan
request
6 Other loans 0, 1, 2
7 Payment method 0,1
8 Job type 0,1,2,3
9 Job seniority from 0 to date of loan request
10 Net monthly earnings integer
11 Collateral 0,1,2
12 Loan type 0,1,2,3
13 Amount of loan integer value
14 Amount of installment integer value
15 Duration of loan integer value
-------------------------------------------------------------------------------------------------------------------
Customer Loan Approval (Cont.)
31
Computed output variable
1 Repayment probability from 0 to 100
Desired output variable
1 Real result of grant loan 0 if payment irregular or null
100 if payment on schedule
Some of these variables were numerical (e.g. the number of
relatives, while other used a digit as a label to indicate a specific
class (e.g. the values 0,1,2,3 of variable 8 referred to four
different classes of employment).
For each record a single variable indicated whether the loan
reached was extinguished without any problem (Z=100) or with
some problem (Z=0).
In its a-posteriori analysis, the bank classified loans with Z=0 as
bad loans. In the provided data, only about 6% of the loans
were classified as bad. Thus, any ANN that classifies loans from
a similar population ought to make errors in a percentage that is
substantially lower than 6% to be of any use (otherwise, it could
have simply classified all loans as good, resulting in an error on
6% of the cases).
Customer Loan Approval (Cont.)
32
Out of 1000 available records, 400 were randomly selected as a
training set for the configuration of the ANN, while the remaining
600 cases were then supplied to the configured ANN so that its
computed output could be compared with the real value of
variable Z.
Beside the network topology, there are many parameters that
must be set. One of the most critical parameters is the number of
neurons constituting the hidden layer, as too few neurons can
hold up the convergence of the training process, while too many
neurons may result in a network that can learn very accurately
(straight memorization) those cases that are in the training set,
but is unable to generalize what has learned to the new cases in
the testing set.
The research team selected a network with 10 hidden nodes as
the one that provided the most promising performance; the
number of iterations was set to 20,000 to allow a sufficient
degree of learning, without loss of performance in generalization
capability.
Customer Loan Approval (Cont.)
33
The single output of our network turned out to be in the range
from -30 to +130, whereas the corresponding real output was
limited to the values Z=0 or Z=100. A negative value of the
output would indicate a very bad loan and thus negative values
were clamped to zero; similarly, output values above 100 were
assigned the value of 100.
A 30% tolerance was used on the outputs so that loans would be
classified as good if the ANN computed a value above 70, and
bad is their output was less then 30. Loans that fell in the
intermediate band [30, 70] were left as unclassified. The width
of this band is probably overly conservative and a smaller one
would have sufficed, at the price of possibly granting marginal
loans, or refusing loans that could have turned out to be good at
the end. The rationale for the existence of the unclassified
band is to provide an alarm requesting a more detailed
examination unforeseen and unpredictable circumstance.
Customer Loan Approval (Cont.)
34
This specific ANN was then supplied with the remaining 600
cases of the testing set.
This set contained 38 cases that had been classified as bad
(Z=0), while the remaining 562 cases had been repaid on
schedule.
Clearly the ANN separates the given cases into two non-
overlapping bands: the good ones near the top and the bad ones
near the bottom. No loan was left unclassified, so in this case
there would have been no cases requiring additional (human)
intervention.
The ANN made exactly three mistakes in the classification of the
test cases: those were 3 cases that the ANN classified as good
loans, whereas in reality they turned out to be bad. Manual, a-
posteriori inspection of the values of their input variables did not
reveal any obvious symptoms that they were problem cases.
What could have likely happened is that the applicant did not
repay the loan as schedule due to some completely unforeseen
and unpredictable circumstance. This is also supported by the
fact that the bank officers themselves approved those three
loans, thus one must presume that they did not look too risky at
application time.
Customer Loan Approval (Cont.)
35
The ANN, however, was more discriminating than the
bank officers since the ANN would have denied 35
loan applications that scored less than 30.
As is turns out, all those 35 loans had problems with
their repayments and thus the bank would have been
well advised to heed the networks classification and to
deny those 35 applications. Had the bank followed
that advise, 268 million liras would have not been put
in jeopardy by the bank (out of a total of more than 3
billion liras of granted loans that were successfully
repaid.)
--------------------------------------------------------
F. D. Nittis, G. Tecchiolli & A. Zorat, Consumer Loan Classification
Using Artificial Neural Networks, ICSC EIS98 Conference, Spain
Feb.,1998
Customer Loan Approval (Cont.)
36
Loan classification by ANN
Loan number
Bankruptcy Prediction
37
There have been a lot of work on developing neural networks to
predict bankruptcy using financial ratios and discriminant analysis.
The ANN paradigm selected in the design phase for this problem
was a three-layer feed-forward ANN using back-propagation.
The data for training the network consisted of a small set of
numbers for well-known financial ratios, and data were available
on the bankruptcy outcomes corresponding to known data sets.
Thus, a supervised network was appropriate, and training time
was not a problem.
Application Design
There are five input nodes, corresponding to five financial ratios:
X1: Working capital/total assets
X2: Retained earnings/total assets
X3: Earnings before interest and taxes/total assets
X4: Market value of equity/total debt
X5: Sales/total assets
Bankruptcy Prediction (Cont.)
38
A single output node gives the final classification showing
whether the input data for a given firm indicated a potential
bankruptcy (0) or nonbankruptcy (1).
The data source consists of financial ratios for firms that did or
did not go bankrupt between 1975 and 1982.
Financial ratios were calculated for each of the five aspects
shown above, each of which became the input for one of the five
input nodes.
For each set of data, the actual result, whether or not bankruptcy
occurred, can be compared to the neural networks output to
measure the performance of the network and monitor the training.
ANN Architecture
The architecture of the ANN is shown in the following figure
Bankruptcy Prediction (Cont.)
39
Bankruptcy Prediction (Cont.)
40
Training
The data set, consisting of 129 firms, was partitioned into
a training set and a test set. The training set of 74 firms
consisted of 38 that went bankrupt and 36 that did not.
The needed ratios were computed and stored in the
input file to the neural network and in a file for a
conventional discriminant analysis program for
comparison of the two techniques.
The neural network has three important parameters to be
set: learning threshold, learning rate, and momentum.
The learning threshold allows the developer to vary the
acceptable overall error for the training case.
The learning rate and momentum allow the developer to control
the step sizes the network uses to adjust the weights as the
errors between computed and actual outputs are fed back.
Bankruptcy Prediction (Cont.)
41
Testing
The neural network was tested in two ways: by using the test data
set and by comparison with discriminant analysis. The test set
consisted of 27 bankrupt and 28 non-bankrupt firms. The neural
network was able to correctly predict 81.5% of the bankrupt cases
and 82.1% of the non-bankrupt cases.
Overall, the ANN did much better predicting 22 out of the 27 actual cases
(the discriminant analysis predicted only 16 cases correctly).
An analysis of the errors showed that 5 of the bankrupt firms classified as
non-bankrupt were also misclassified by the discriminant analysis
method. A similar situation occurred for the non-bankrupt cases.
The result of the testing showed that neural network
implementation is at least as good as the conventional
approach. An accuracy of about 80% is usually
acceptable for ANN applications. At this level, a system
is useful because it automatically identifies problem
situations for further analysis by a human expert.
---------------------------------------------------------------------------------------------------------------
R.L. Wilson and R. Sharda, Bankruptcy Prediction Using Neural Networks,
Decision Support Systems, Vol. 11, No. 5, June 1994, pp. 545-557.
1. Explore WEKA Software
2. Use the available training & testing data to
be analyzed using ANN algorithm
3. Make a Paper!
4. Present your result!
The detail is available in e-learning!
Assignment!
42