HE Etropolis Lgorithm: Theme Article

the Top
THEME ARTICLE
THE METROPOLIS ALGORITHM

The Metropolis Algorithm has been the most successful and influential of all the members of
the computational species that used to be called the “Monte Carlo Method.” Today, topics
related to this algorithm constitute an entire field of computational science supported by a
deep theory and having applications ranging from physical simulations to the foundations
of computational complexity.
T
he story goes that Stan Ulam was in Neumann discussed his approach in a letter to
a Los Angeles hospital recuperating Bob Richtmeyer [11 Mar. 1947] and in a later
and, to stave off boredom, he tried letter to Ulam [21 May 1947]. Interestingly, the
computing the probability of getting letter to Richtmeyer contains a fairly detailed
a “perfect” solitaire hand. Before long, he hit on program for the Eniac, while the one to Ulam
the idea of using random sampling: Choose a gives an explanation augmented by what we
solitaire hand at random. If it is perfect, let count would today call pseudocode.)
= count + 1; if not, let count = count. After M sam- Since the rejection method’s invention, it has
ples, take count/M as the probability. The hard been developed extensively and applied in a wide
part, of course, is deciding how to generate a variety of settings. The Metropolis Algorithm
uniform random hand. What’s the probability dis- can be formulated as an instance of the rejection
tribution to draw from, and what’s the algorithm method used for generating steps in a Markov
for drawing a hand? chain. This is the approach we will take.
Somewhat later, John von Neumann provided
part of the answer, but in a different context. He
introduced the rejection algorithm for simulating The rejection algorithm
neutron transport. In brief, if you want to sample First, let’s look at the setup for the rejection
from some specific probability distribution, sim- algorithm itself. We want to sample from a pop-
ply sample from any distribution you have ulation (for example, solitaire hands or neutron
handy, but keep only the good samples. (Von trajectories) according to some probability dis-
tribution function, ν, that is known in theory but
hard to sample from in practice. However, we
can easily sample from some related probability
1521-9615/00/$10.00 © 2000 IEEE
distribution function µ.
ISABEL BEICHL So we do this:
NIST
1. Use µ to select a sample, x.
FRANCIS SULLIVAN 2. Evaluate ν(x). This should be easy, once
IDA/Center for Computing Science we have x.
JANUARY/FEBRUARY 2000 65
3. Generate a uniform random ρ ∈[0,1) simulated annealing—and then we examine the
if ρ < cν(x)/µ(x) problem of counting.
then accept x
else try again with another x The Ising model
Here we choose c so that cν(x)/µ(x) < 1 This model is one of the most extensively
for all x. studied systems in statistical physics. It was de-
veloped early in the 20th century as a model
First, the probability of selecting and then ac- of magnetization and related phenomena. The
cepting some x is model is a 2D or 3D regular array of spins σi ∈
ν (x ) {–1, 1} and an associated energy E(σ) for each
c µ(x ) = cν (x ) . configuration. A configuration is any particular
µ(x )
choice for the spins, and each configuration has
Also, if we are collecting samples to estimate the the associated energy
weighted mean, S(f), of some function f(x)—that
is, S(f) = Σf(x)cν(x)—we could merely select some E (σ ) = − ∑ Ji, jσ iσ j − B∑ σ k.
large number, M, of samples by using µ(x), re- i. j k
ject none of them, and then compute the uniform The sum is over those {i, j} pairs that interact
mean: (usually nearest neighbors). Ji,j is the interaction
ν (x ) coefficient (often constant), and B is another
∑ f ( x )c µ ( x ) .
1
constant related to the external magnetic field.
M
In most applications, we want to estimate a
That is, if we don’t reject, the ratios give us a mean of some function f(σ) because such quanti-
sample whose mean converges to the mean for ties give us a first-principles estimate of some fun-
the limiting probability distribution function ν. damental physical quantity. In the Ising model,
This method for estimating a sum is an instance the mean F is taken over all configurations:
of importance sampling, because it attempts to
∑ f (σ ) exp(− E(σ ) κT ) .
1
choose samples according to their importance. F =
Z (T ) σ
The ideal importance function is
But here, the weights come from the expression
f (x )ν (x ) for the configuration’s energy. The normalizing
µ(x ) = .
∑ y f ( y)ν ( y) factor Z(T) is the partition function:
The alert reader will have noticed that this Z (T ) = ∑ exp(− E(σ ) κT ) .
µ(x) requires knowledge of the answer. However, σ
importance sampling works for a less-than- T is the temperature and κ is the Boltzmann
perfect µ(x). This is because the fraction of the constant.
samples that equal any particular x will converge A natural importance-sampling approach
to µ(x), so the sample mean of f(x)cν(x)/µ(x) will might be to select configurations from the dis-
converge to the true mean. For the special case tribution:
of the constant function, f(x) = 1, the quantity S
is the probability of a “success” on any particular µ(σ ) =
(
exp − E (σ ) κT )
Z (T )
trial of the rejection method. If we take f to be
the function that is identically equal to one, we so that the sample mean of M samples,
∑k f (σ k )
might know the value of S in advance. In that
case, we also know the rejection rate 1/S, which F=
is the average number of trials before each suc- M
cess. As we shall see, when we use rejection in its will converge rapidly to the true mean, F.
formulation as the Metropolis Algorithm, prior The problem, of course, is finding a way to
knowledge of the rejection rate leads to a more sample from µ. In this case, sampling from the
efficient method called Monte Carlo time.1 proposed “easy” distribution µ is not so simple.
Nick Metropolis and his colleagues made the fol-
lowing brilliant observation.2 If we change only
Applications: The Metropolis Algorithm one spin, the change in energy, ∆E, is easy to eval-
We first look at two important applications of uate, because only a few terms of the sum change.
the Metropolis Algorithm—the Ising model and This observation gives a way of constructing an
66 COMPUTING IN SCIENCE & ENGINEERING

aperiodic symmetric Markov chain converging to set is that of possible tours. One approach is hill
the limit distribution µ. The transition probabil- climbing. That is, given a set of possible changes to
ities, pξ,σ, are such that for each configuration, σ, a tour, such as permuting the order of some visits,
choose the change that decreases the tour length as
µ(σ ) = ∑ µ(ξ) pξ,σ . much as possible. This approach’s drawback is that
ξ it can get stuck at a local minimum, if all moves
The sum is over all configurations ξ that differ from a tour increase that tour’s total length.
from σ by one spin, and The Metropolis Algorithm offers a possible
method for jumping out of a local minimum. Let
µ(σ )
pξ, σ =
µ(ξ )
((
= exp − E (σ ) − E (ξ ) κT ) ) the tour’s length play the same role that energy
plays in the Ising model, and assign a formal
(
= exp − ∆E (σ ) κT ) “temperature,” T, to the system. Then, as long
as T > 0, there is always a nonzero probability
when ∆(E) > 0, and pξ,σ = 1 when ∆(E) < 0. for increasing the tour length so that you needn’t
In other words, if the move lowers the energy, get trapped in local minima.
do it, and if it raises the energy, do it with some Three questions occur:
probability p, meaning reject it with some prob-
ability 1 − p. But how to choose the site for the 1. Does it work?
attempted move? We use rejection yet again. If 2. How long does it take?
there are n sites, we use a probability distribu- 3. Is it better than merely using hill climbing
tion that looks like with many different random starts?
cν (σ ) =
(
min 1,exp − ∆E (σ ) κT ( )) The answers seem to be
n
so that we take 1/n as the “easy” probability. That 1. Yes. A large literature covers both the theory
is, we select a site uniformly and accept it accord- and applications in many different settings.
ing to the Metropolis criterion we just described. However, if T > 0, the limit probability dis-
For this case, the expression for the success tribution will be nonzero for nonoptimal
rate is tours. The way around this is to decrease T as
( (
min 1, exp − ∆E (σ i ) κT )) the computation proceeds. Usually, T de-
S = ∑ cv(σ ) = ∑ i
n . creases like log(sk) for some positive, de-
i i
creasing sequence of “cooling schedule” val-
So, the probability of exactly k rejections followed ues sk, so that the acceptance probability
by a success is the same as the probability that a decreases linearly until only the true mini-
random ρ satisfies (1 − S)k+1 < ρ < (1 − S)k, giving mum is accepted.
this stochastic expression for the waiting time: 2. It depends. Designing cooling schedules to
log(ρ ) optimize the solution time is an active re-
k= . search topic.
log(1 − S)
3. Someone should investigate this carefully.
We can use this to avoid the rejection steps while
still “counting” how many rejections would have
occurred.1 Counting
In principle, this Monte Carlo-time method Let’s reconsider Ulam’s original question in a
works with any rejection formulation. However, slightly more general form: How many mem-
each stage requires explicit knowledge of all pos- bers of a population P have some specific prop-
sible next steps. In other words, we need the val- erty U? We could do the counting by designing
ues for the “difficult” distribution ν(x). In the a Markov chain that walks through P and has a
Metropolis Ising case, the Markov chain formu- limit probability distribution ν that is somehow
lation makes this feasible. related to our interesting property U. To be
more concrete, P might be the set of partial
Simulated annealing matchings of a bipartite graph G, and U might
Suppose we wish to maximize or minimize some be the set of matchings that are “perfect,” mean-
real-valued function defined on a finite (but large) ing they include every graph node.
set. The classic example is the traveling salesman’s To have our Markov chain do what we want,
problem. The function is the tour’s length, and the we define a partition function:
Z(λ ) = ∑ m λk .
k Rapid mixing
k Jerrum and Sinclair have provided conver-
The partition function is associated with the gence results and applications to important com-
probability distribution binatorial problems, such as the monomer-dimer
problem.3 To obtain their results, they look for a
m k λk
ν ( k) = . property they call rapid mixing for Markov
Z(λ ) chains. Jerrum has also proved some “slow con-
Here, mk is the number of k-matchings, and λk vergence” results showing that, in some situa-
plays a role similar to that played by exp(− tions, Metropolis sampling does not mix rapidly
E(σ)/(κT)) in the Ising problem. On each step, and so converges too slowly to be practical.4
if the move selected is from a k-matching to a (k
+ 1)-matching, the probability of doing so is λ. Coupling from the past
Mark Jerrum and Alistair Sinclair show that the The Metropolis Algorithm and its generaliza-
fraction of the samples that are k-matchings can tions have come to be known as the Monte Carlo
be used to estimate the mk to whatever accuracy Markov Chain technique (MCMC) because they
is desired and that, for fixed accuracy, the time simulate a Markov chain in the hope of sampling
for doing so is a polynomial in the problem’s from the limit distribution. For the Ising model,
size.3 Physicists call estimating the mk the this limit distribution is
monomer-dimer problem because having a k-
matching means that k pairs have been matched ν (σ ) =
(
exp − E (σ ) kT ).
as dimers and the unmatched are monomers. Z (T )
The big question is, when are we at the limit dis-

The limit distribution tribution? That is, what is the convergence rate?
The Metropolis Algorithm defines a conver- In some cases, we can sample directly from the
gent Markov chain whose limit is the desired limit distribution. Jim Propp and David Wilson
probability distribution. But what is the conver- developed a method for this called coupling from
gence rate? Put differently, does a bound exist the past (CFTP).5
on the number of Metropolis steps, τ, required Think of a single Metropolis move as a map: f1
before the sample is close enough to the limit : S → S. For example, in the Ising model, choose
distribution? In some cases, τ can be bounded by some site k1 and generate a random ρ1 for the re-
a polynomial in the problem size; in other cases, jection test. Depending on the particular state σ,
we can show that no such bound exists. either f1(σ) = σ or f1(σ) = σ´, where σ´ differs
from σ at one site.
Generally, the image of the set of all states
How to Reach CiSE does not cover all states—that is, f1[S] ⊂ S. And,
if we now choose a second map, f2, we get f1f2[S]
Writers
For detailed information on submitting articles, write to cise@computer.org
⊂ f1[S] ⊂ S. Continuing in this way gives
or visit computer.org/cise/edguide.htm.
Fk[S] = f1f2f3 … fk[S] ⊂ f1f2f3 ... fk−1[S] ⊂ ... ⊂ S.
Letters to the Editors
Jenny Ferrero, Lead Editor, jferrero@computer.org
Please provide an e-mail address or daytime phone number with your letter. The functions are composed from the inside
On the Web
out; to add later maps, we must save earlier ones.
Access computer.org/cise for information about CiSE. Because the image is getting smaller, it might be-
come a singleton; that is, Fk is constant. Propp and
Subscribe Wilson show that such a singleton will have been
Visit www.aip.org/cip/subscrib.htm or computer.org/subscribe. selected from the limit distribution. So, we have a
Missing or Damaged Copies method to sample from the true limit distribu-
If you are missing an issue or you received a damaged copy, contact tion, provided that we are willing to save all the
membership@computer.org.
maps and that we can tell when we have enough
Reprints of Articles of them. One of several methods for telling when
For price information or to order reprints, send e-mail to cise@computer. we have converged is to look for monotonicity.
org or fax +1 714 821 4010.
For some systems, there is an order, <, for states,
Reprint Permission there are bottom and top states {⊥ < >}, and the
To obtain permission to reprint an article, contact William Hagen, IEEE
maps are order-preserving. In this case, we can
Copyrights and Trademarks Manager, at whagen@ieee.org.
COMPUTING IN SCIENCE & ENGINEERING

apply the iteration to both ⊥ and > and wait for the
two ends to meet. This works, for example, for
Member Societies
some instances of the Ising model. The American Physical Society
This method’s obvious advantage is that, when Optical Society of America
Acoustical Society of America
such a sample can be obtained, it is “perfect.” The The Society of Rheology
American Association of Physics Teachers
disadvantages are that not all systems are American Crystallographic Association
amenable to this approach and that, when it does American Astronomical Society
American Association of Physicists in Medicine
apply, the waiting time can be long. American Vacuum Society
American Geophysical Union
Other Member Organizations
Sigma Pi Sigma Physics Honor Society
P
rogress in MCMC has been impressive Society of Physics Students
Corporate Associates
and seems to be accelerating. Problems
The American Institute of Physics is a not-for-profit membership corporation chartered in
that appeared impossible have been solved. New York State in 1931 for the purpose of promoting the advancement and diffusion of the
For combinatorial counting problems, re- knowledge of physics and its application to human welfare. Leading societies in the fields
of physics, astronomy, and related sciences are its members.
cent advances have been remarkable. However, two
things should be borne in mind. The Institute publishes its own scientific journals as well as those of its Member Societies;
provides abstracting and indexing services; provides online database services;
The first is a famous remark attributed to von disseminates reliable information on physics to the public; collects and analyzes statistics on
the profession and on physics education; encourages and assists in the documentation and
Neumann: Anyone using Monte Carlo is in a state of study of the history and philosophy of physics; cooperates with other organizations on edu-
sin. We might add that anyone using MCMC is cational projects at all levels; and collects and analyzes information on Federal programs
and budgets.
committing an especially grievous offense. Monte
Carlo is a last resort, to be used only when no exact The scientists represented by the Institute through its Member Societies number approxi-
mately 120,000. In addition, approximately 5400 students in over 600 colleges and uni-
analytic method or even finite numerical algorithm versities are members of the Institute’s Society of Physics Students, which includes the
is available. And, except for CFTP, the prescription honor society Sigma Pi Sigma. Industry is represented through 47 Corporate Associates
members.
for use always contains the phrase “simulate for a
Governing Board*
while,” meaning until you feel as if you’re at the John A. Armstrong (Chair), Gary C. Bjorklund, Martin Blume, Marc H. Brodsky (ex officio),
limit distribution. As we mentioned, for the Me- James L. Burch, Ilene J. Busch-Vishniac, Robert L. Byer, Brian Clark, Lawrence A. Crum,
Judy R. Franz, Jerome I. Friedman, Christopher G. A. Harrison, Judy C. Holoviak, Ruth
tropolis method, there are even systems for which Howes, Frank L. Huband, Bernard V. Khoury, Larry D. Kirkpatrick, John Knauss, Leonard V.
convergence is provably slow. The antiferromag- Kuhi, Arlo U. Landolt, James S. Langer, Louis J. Lanzerotti, Charlotte Lowe-Ma, Rudolf
Ludeke, Christopher H. Marshall, Thomas J. McIlrath, Arthur B. Metzner, Robert W. Milkey,
netic Ising model is one such case. In some situa- Thomas L. O’Kuma, Richard C. Powell, S. Narasinga Rao, Charles E. Schmid, Andrew M.
tions, no randomized method, including MCMC, Sessler, James B. Smathers, Benjamin B. Snavely (ex officio), A. Fred Spilhaus, Jr., John A.
Thorner, George H. Trilling, William D. Westwood, Jerry M. Woodall
will converge rapidly. *Executive Committee members are printed in italics.
The second thing to bear in mind is that MCMC Management Committee
is only one of many possible importance-sampling tech- Marc H. Brodsky, Executive Director and CEO; Richard Baccante, Treasurer and CFO;
Theresa C. Braun, Director, Human Resources; James H. Stith, Director of Physics Pro-
niques. For several cases, including the dimer cover grams; Darlene A. Walters, Vice President, Publishing; Benjamin B. Snavely, Corporate
problem, the ability to approximate the limit distri- Secretary
bution directly results in extremely efficient and ac- Subscriber Services

curate importance-sampling methods that are quite AIP subscriptions, renewals, address changes, and single-copy orders should be addressed
to Circulation and Fulfillment Division, American Institute of Physics, 1NO1, 2 Huntington
different from MCMC.6 However, a solid theory Quadrangle, Melville, NY 11747-4502. Tel. (800) 344-6902. Allow at least six weeks’ advance
notice. For address changes please send both old and new addresses, and, if possible,
for these approaches is still almost nonexistent. include an address label from the mailing wrapper of a recent issue.
References
1. I. Beichl and F. Sullivan, “(Monte-Carlo) Time after Time,” IEEE 6. I. Beichl and F. Sullivan, “Approximating the Permanent via Im-
Computational Science & Eng., Vol. 4, No. 3, July–Sept. 1997, pp. portance Sampling with Application to the Dimer Covering Prob-
91–95. lem,” J. Computational Physics, Vol. 149, No. 1, Feb. 1999, pp.
2. N. Metropolis et al., “Equation of State Calculations by Fast Com- 128–147.
puting Machines,” J. Chemical Physics, Vol. 21, 1953, pp.
1087–1092.
Isabel Beichl is a mathematician in the Information
3. M. Jerrum and A. Sinclair, “The Markov Chain Monte Carlo
Technology Laboratory at the National Institute of Stan-
Method: An Approach to Counting and Integration,” Approxima-
tion Algorithms for NP-Hard Problems, Dorit Hochbaum, ed., PWS dards and Technology. Contact her at NIST, Gaithers-
(Brooks/Cole Publishing), Pacific Grove, Calif., 1996, pp. 482–520. burg, MD 20899; isabel@cam.nist.gov.
4. V. Gore and M. Jerrum, “The Swendson-Wang Process Does Not
Always Mix Rapidly,” Proc. 29th ACM Symp. Theory of Computing,
Francis Sullivan is the associate editor-in-chief of CiSE and
ACM Press, New York, 1997, pp. 157–165.
director of the Institute for Defense Analyses’ Center for
5. J.G. Propp and D.B. Wilson, “Exact Sampling with Coupled Markov
Chains and Applications to Statistical Mechanics,” Random Structures Computing Sciences. Contact him at the IDA/Center for
and Algorithms, Vol. 9, Nos. 1 & 2, 1996, pp. 223–252. Computing Sciences, Bowie, MD 20715; fran@super.org.

HE Etropolis Lgorithm: Theme Article

Uploaded by

Copyright:

Available Formats

HE Etropolis Lgorithm: Theme Article

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HE Etropolis Lgorithm: Theme Article

Uploaded by

Copyright:

Available Formats

the Top

THE METROPOLIS ALGORITHM

66 COMPUTING IN SCIENCE & ENGINEERING

The big question is, when are we at the limit dis-

COMPUTING IN SCIENCE & ENGINEERING

bution directly results in extremely efficient and ac- Subscriber Services

You might also like