|
Many practitioners have some intuitive familiarity
with the Monte Carlo method from their work. At an elementary level, it is
a surprisingly simple concept, but it can be computationally expensive to
use. It is easy to code Monte Carlo analyses that take hours or even days
to run. To speed up analyses—to make them run in minutes as opposed to
days—users need to employ techniques such as variance reduction. These
techniques are easy to learn, but they are NOT intuitive. To use them,
users need a sophisticated understanding of how and why the Monte Carlo
method works. In this article, I introduce the Monte Carlo method from
such a standpoint. I also close the article with some recommended books.
Credit for inventing the Monte Carlo method often
goes to Stanislaw Ulam, a Polish born mathematician who worked for John
von Neumann on the United States’ Manhattan Project during World War II.
Ulam is primarily known for designing the hydrogen bomb with Edward Teller
in 1951. He invented the Monte Carlo method in 1946 while pondering the
probabilities of winning a card game of solitaire. Quoted in Eckhardt
(1987), Ulam describes the incident as:
The first thoughts and attempts I made to practice [the
Monte Carlo Method] were suggested by a question which occurred to me in
1946 as I was convalescing from an illness and playing solitaires. The
question was what are the chances that a Canfield solitaire laid out with
52 cards will come out successfully? After spending a lot of time trying
to estimate them by pure combinatorial calculations, I wondered whether a
more practical method than “abstract thinking” might not be to lay it out
say one hundred times and simply observe and count the number of
successful plays. This was already possible to envisage with the beginning
of the new era of fast computers, and I immediately thought of problems of
neutron diffusion and other questions of mathematical physics, and more
generally how to change processes described by certain differential
equations into an equivalent form interpretable as a succession of random
operations. Later … [in 1946, I] described the idea to John von Neumann,
and we began to plan actual calculations.
The Monte Carlo method, as it is understood today,
encompasses any technique of statistical sampling employed to approximate
solutions to quantitative problems. Ulam did not invent statistical
sampling. This had been employed to solve quantitative problems before,
with physical processes such as dice tosses or card draws being used to
generate samples. W. S. Gossett, who published under the pen name
“Student,” randomly sampled from height and middle finger measurements of
3,000 criminals to simulate two correlated normal distributions. He
discusses this methodology in both Student (1908a) and Student (1908b). Ulam’s contribution was to recognize the potential for the newly invented
electronic computer to automate such sampling. Working with John von
Neuman and Nicholas Metropolis, he developed algorithms for computer
implementations, as well as exploring means of transforming non-random
problems into random forms that would facilitate their solution via
statistical sampling. This work transformed statistical sampling from a
mathematical curiosity to a formal methodology applicable to a wide
variety of problems. It was Metropolis who named the new methodology after
the casinos of Monte Carlo. Ulam and Metropolis published the
first paper
on the Monte Carlo method in 1949.
A standard way to introduce the Monte Carlo method is
with the problem of Buffon's needle. This is a charming problem that leads
directly to some important (not all intuitive!) conclusions.
More than 200 years before Metropolis coined the name “Monte Carlo
method,” George Louis Leclerc, Comte de Buffon, proposed the following
problem. If a needle of length l is dropped at random on the middle
of a horizontal surface ruled with parallel lines a distance d >
l apart, what is the probability that the needle will cross one of the
lines?
To solve
the problem, consider Exhibit 1.
The positioning of the needle relative to nearby
lines can be described with a random vector which has components
and
. The
random vector (A,
) is
uniformly distributed on the region [0,d)
[0, ).
Accordingly, it has probability density function
.
The probability that the needle will cross one of the lines is
given by the integral
 |
[1] |
This integral is easily solved to obtain the probability
as
 |
[2] |
In this solution, the great mathematician Laplace
perceived an amusing, if inefficient means of approximating the number
.
Suppose Buffon’s experiment is performed with the needle being dropped
n times. Let M be the random variable for the number of times
the needle crosses a line, then
 |
[3] |
where E indicated an
expectation. Note that
formulas [2] and [3] represent the
same probability. Setting them equal to each other and rearranging, we
obtain an expression for
:
 |
[4] |
This means that
 |
[5] |
is a statistical estimator for
. If we
drop the needle n times, and it crosses a line m times, we
can substitute the result m for the random variable M in [5].
The number we obtain will be an estimate for
.
In 1864, Captain O. C. Fox performed this experiment
three times to occupy himself while recovering from wounds. His results
are shown in Exhibit 2:
Fox’s results illustrate two important issues for the Monte Carlo method.
After obtaining a poor approximation for
in his
first experiment, Fox modified his subsequent experiments by applying a
slight rotary motion to the ruled surface between drops. He did this to
eliminate any bias arising from his position while dropping the needle.
His approximations in subsequent experiments were improved. This same
need to eliminate bias exists today with computer implementations of the
Monte Carlo method, although such biases now arise from inappropriate
pseudorandom number generators instead of posture.
In his third experiment, Fox used a five
inch needle and made the lines on the surface just two inches apart.
Now, the needle could cross as many as three lines in a single toss. In
590 tosses, Fox obtained 939 line crossings. Doing so entailed little
more effort than his first two experiments, but it yielded a better
approximation for
. Fox’s
innovation was a precursor of today’s variance reduction techniques.
This example also illustrates another point. Many practitioners who first
experience the Monte Carlo method on the job see it being used to solve
inherently probabilistic problems, such as
derivatives pricing or weather
forecasting. They assume the use of random sampling makes it applicable
only to probabilistic problems. As our example illustrates, this is not
true. There is nothing probabilistic about estimating the number
!
To understand the Monte Carlo method theoretically, it is useful to think
of it as a general technique of numerical integration. It can be shown, at
least in a trivial sense, that every application of the Monte Carlo method
can be represented as a definite integral. As one example, consider our
needle dropping experiment. It is nothing more than an elaborate means of estimating the
integral [1].
Suppose we need to evaluate a multi-dimensional definite integral of the
form:
 |
[6] |
Most integrals can be converted to this form with a
suitable change of variables, so we can consider this to be a general
application suitable for the Monte Carlo method.
The integral represents a non-random problem, but the
Monte Carlo method approximates a solution by introducing a random vector
U that is uniformly distributed on the region of
integration. Applying the function f to U, we obtain
a random variable f(U). This has
expectation:
 |
[7] |
where
is the
probability density function of U. Because
equals 1
on the region of integration, [7] becomes
 |
[8] |
Comparing [6] and [8],
we obtain a probabilistic expression for the integral
:
 |
[9] |
so random
variable f(U) has mean
and some
standard deviation
.
We define
 |
[10] |
as an
unbiased estimator for
with
standard error
.
This is a little unconventional, since [10] is an
estimator that depends upon a sample {U[1]}of
size one, but it is a valid estimator nonetheless.
To estimate
with a standard error lower than
, let’s
generalize our estimator to accommodate a larger sample {U[1],
U [2], … , U [m]}.
Applying the function f to each of these yields m
independent and identically distributed (IID) random variables f(U
[1]), f(U [2]), … , f(U
[m]), each with expectation
and
standard deviation
. The
generalization of [10]
 |
[11] |
is an unbiased estimator for
with
standard error:
 |
[12] |
If we have a realization {u[1],
u[2], … , u[m]}
for our sample, we may estimate
as
 |
[13] |
We call [11] the
crude Monte Carlo estimator.
Formula [12] for its standard error is important for two reasons. First,
it tells us that the standard error of a Monte Carlo analysis decreases
with the square root of the sample size. If we quadruple the number of
realizations used, we will half the standard error. Second, standard error
does not depend upon the dimensionality of the integral [6].
Most techniques of numerical integration—such as the trapezoidal rule or
Simpson's method—suffer from the
curse of dimensionality. When generalized to multiple dimensions,
the number of computations required to apply them increases exponentially
with the dimensionality of the integral. For this reason, such methods
cannot be applied to integrals of more than a few dimensions. The Monte
Carlo method does not suffer from the curse of dimensionality. It is as
applicable to a 1000-dimensional integral as it is to a one-dimensional
integral.
While increasing the sample size is one technique for
reducing the standard error of a Monte Carlo analysis, doing so can be
computationally expensive. A better solution is to employ some technique
of variance reduction. These techniques incorporate additional information
about the analysis directly into the estimator. This allows them to make
the Monte Carlo estimator more deterministic, and hence have a lower
standard error.
Standard techniques of variance reduction include
antithetic variates,
control variates,
importance sampling,
and
stratified sampling.
Let's describe the method of control variates.
Consider
crude Monte Carlo estimator:
 |
[14] |
Let
be a
real
valued function for which the mean
 |
[15] |
is known.
We shall
refer to the random variable
(U)
as a control variate. Consider the random variable f*(U)
based upon this control variate:
 |
[16] |
for some constant c. By construction,
 |
[17] |
so we
can estimate
with the Monte Carlo estimator
 |
[18] |
This will
have a lower standard error than crude estimator [14]
if the standard deviation
* of
is smaller than the standard deviation
of f(U).
This will happen if
(U)
has a high correlation
with the
random variable f(U), in which case random variables
c (U)
and f(U) will tend to offset each other in [18].
We formalize this observation by calculating
where
is
the standard deviation of
(U).
Obviously,
* will
be smaller than
if
 |
[22] |
It can be
shown that
* is
minimized by setting
 |
[23] |
in which case,
from [21],
 |
[24] |
Often,
and
are unknown, which makes determining the optimal value [24]
for c problematic. We can estimate
and
with a separate Monte Carlo analysis. Alternatively, if
closely approximates f, c might simply be set equal to 1.
As the above
description indicates, the key to the method of control variates is
finding a function
that
closely approximates f, and for which E[ (U)]
is easy to calculate.
|
|
 |
|
|
|
|
 |
 |
|
|
|
 |
|
For books on the Monte Carlo Method,
see book reviews under
Mathematics–Monte Carlo Method. In particular, Rubinstein (1981)
offers an excellent introduction. Fishman (1996)
is the authoritative reference. Glasserman (2003) is an advanced
book on use of the Monte Carlo method in financial engineering.
|
|
|
|
 |
|
Eckhardt, Roger
(1987). Stan Ulam, John von Neumann, and the Monte Carlo method,
Los Alamos Science, Special Issue (15), 131-137.
Metropolis, Nicholas and Stanislaw Ulam
(1949). The Monte Carlo method, Journal of the American
Statistical Association, 44 (247), 335-341.
Student (1908a). The probable error
of a mean. Biometrika, 6, 1-25.
Student
(1908b). Probable error of a correlation coefficient.
Biometrika, 6, 302-310. |
|
|
|
 |
 |
Ads by Contingency Analysis
|
|
|
 |
|
|
|
|
|