Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods

Optimization methods
• Gradient Descent
• Conjugate Gradient
• Levenberg-Marquardt
• Quasi-Newton
• Evolutionary methods
Common characteristic of Derivative-
Free Optimization Methods
1. Derivative freeness
PSO- nature’s wisdom
2. Intuitive guidelines GA-
Evolution
SA-
Thermodynamics
3. Slowness
4. Flexibility- any type of objective function
5. Randomness- they all use rand
6. Analytic studies not possible- because of their
randomness and problem specific nature. Most of our
knowledge about them is based on empirical studies
7. Iterative nature- so we need some stopping criteria
Optimization Problem
In the training samples shown below, i stands for inputs and t targets. We
have 3 vectors. There are two output neurons.
i11 , i21 t11 , t 21

i12 , i22 t12 , t 22
i13 , i23 t13 , t 23
First randomly generate weights. Apply all the inputs. Let outputs of the
two output neurons are o11 and o21 when the first input is applied.
Similarly for other inputs also
e1  (t11  o11 ) 2  (t 21  o 21 ) 2
e2  (t12  o12 ) 2  (t 22  o22 ) 2
e3  (t13  o13 ) 2  (t 23  o23 ) 2
• The root mean square of the error is
e1  e2  e3
e
3
• If it is a maximization problem, then the

objective function f is
1
f 
e
Particle Swarm Optimization (PSO)
• PSO is a robust stochastic optimization technique based on
the movement and intelligence of swarms.
• PSO applies the concept of social interaction to problem
solving.
• It was developed in 1995 by James Kennedy (social-
psychologist) and Russell Eberhart (Electrical engineer).
• It uses a number of agents (particles) that constitute a
swarm moving around in the search space looking for the
best solution.
• Each particle is treated as a point in a N-dimensional space
which adjusts its “flying” according to its own flying
experience as well as the flying experience of other
particles.
Social Dynamics Theory
• f
• Individuals searching for solutions learn from the
r experiences of others (individuals learn from
their neighbors)
• Individuals that interact frequently become
similar
• Culture affects the performance of individuals
that comprise it
Like the societies of the birds and the bees, people are motivated to
maintain a certain proximity and heading with each other, not in the
physical world but in the social world of beliefs and desires.
About intelligence
• Social behavior increases the ability of an
individual.
• There is a relationship between adaptability
and intelligence..
• Intelligence arises from interactions among
individuals
Bird flocking, fish schooling, and swarming theory

• PSO simulates the behaviors of bird flocking
• Imagine a flock of birds looking for a wheat field:
they maintain their velocity as they move towards
the field, then when their target is beneath them,
some of them overshoot. Realising their error, they
double back towards the centre of the flock.
Gradually, through a process of circling and swinging
around the centre of the flock they all settle in a
good spot. PSO is an abstraction of this process, with
the physical constraints removed, thus each
individual automatically knows where the most
successful individual is (more effective and quicker
than calculating the centre of the swarm) and suffers
no physical constraints in changing direction and
location.
• In PSO, each single solution is a "bird" in the search space.
We call it "particle".
• All of particles have fitness values which are evaluated by the
fitness function to be optimized, and have velocities which
direct the flying of the particles.
• The particles fly through the problem space by following the
current optimum particles.
• A number of particles are initialized randomly within the
search space.
• Each particle has a very simple 'memory' of its personal best
solution so far, called 'pbest'. This is the best solution (fitness)
it has achieved so far.
• The global best solution for each iteration is also found and
labelled 'gbest'. It is the best value, obtained so far by any
particle in the population.
• On each iteration, every particle is moved a certain distance
from its current location, influenced a random amount by the
pbest and gbest values.
PSO Process
• 1.. Initialize population.
• 2. Evaluate fitness of individual particles.
• 3. Modify velocities based on previous best
and global (or neighborhood) best.
• 4. Terminate on some condition.
• 5. Go to step 2.
Velocity and Position Updating
(a)
(b)
The algorithm
For each particle w = wMax-[(wMax-wMin) x iter]/maxIter
Initialize particle
END
Do
For each particle
Calculate fitness value
If the fitness value is better than the best fitness value (pBest) in history
set current value as the new pBest
End
Choose the particle with the best fitness value of all the particles as the gBest
For each particle
Calculate particle velocity according equation (a)
Update particle position according equation (b)
End
While maximum iterations or minimum error criteria is not attained
Particles' velocities on each dimension are clamped to a maximum velocity

Vmax. If the sum of accelerations would cause the velocity on that dimension to
exceed Vmax, which is a parameter specified by the user. Then the velocity on
that dimension is limited to Vmax.
List of the parameters and their typical values
The number of particles: the typical range is 20 - 40. Actually for most of the
problems 10 particles is large enough to get good results. For some difficult
or special problems, one can try 100 or 200 particles as well.
Dimension of particles: It is determined by the problem to be optimized,
Range of particles: It is also determined by the problem to be optimized, you

can specify different ranges for different dimension of particles.
Vmax: it determines the maximum change one particle can take during one
iteration. Usually we set the range of the particle as the Vmax for example,
the particle (x1, x2, x3)
X1 belongs [-10, 10], then Vmax = 20
Learning factors: c1 and c2 usually equal to 2. However, other settings were

also used in different papers. But usually c1 equals to c2 and ranges from [0,
4]
Schwefel's function
n
f ( x)   ( xi )  sin( xi )
i 1
where
 500  xi  500
global minimum
f ( x) = n  418.9829;
xi = 420.9687, i = 1 : n
Evolution－Initialization
Evolution－5 iteration
Search result
Iteration Swarm best

0 416.245599
5 515.748796
10 759.404006
15 793.732019
20 834.813763
100 837.911535
5000 837.965771
Global 837.9658

Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods

Uploaded by

Copyright:

Available Formats

Optimization methods

i11 , i21 t11 , t 21

• If it is a maximization problem, then the

Bird flocking, fish schooling, and swarming theory

Particles' velocities on each dimension are clamped to a maximum velocity

Dimension of particles: It is determined by the problem to be optimized,

Range of particles: It is also determined by the problem to be optimized, you

Learning factors: c1 and c2 usually equal to 2. However, other settings were

Iteration Swarm best

You might also like