|
The normal
distribution is useful for modeling various random quantities, such as
people’s heights, asset returns, and
test scores. This is no coincidence. If a process is additive—reflecting
the combined influence of multiple random occurrences—the result is likely
to be approximately normal. This famous result is known as the the
central limit theorem.
In a nutshell, the central limit theorem states that a sum of random
variables will have a distribution that is approximately normal. The
means and
standard deviations of
the random variables must exist, and other modest conditions must also be
met. In practical applications, those modest conditions are met more often than not.
For an example, consider several independent U(–1,
1) random variables
(see the article uniform
distribution for an explanation of this notation). Let
be the random variable equal to the average of the first n of the
:
 |
[1] |
Exhibit 1 indicates the probability density function (PDF)
of
when n has values 1, 2, 3, 4 and then 5.
|
|
 |
|
 |
|
A random variable that equals an average of
independent U(–1,1) random variables becomes more like a
normal random variable as the number of
U(–1,1) random variables in the average
increases. This is illustrated with a progression of PDF's. Note
that images have been scaled so all have similar widths. |
The first image in Exhibit 1 is simply the PDF of a U(–1,
1) random variable. The second is the PDF of a random variable that is an
average of two independent U(–1, 1) random variables. That PDF has
a triangular shape. Next, with an average of three independent U(–1,
1) random variables, the PDF takes on a bell shape. As n continues
to grow, the shape of the PDF becomes increasingly like that of the normal
distribution. This graphically illustrates the central limit theorem.
Let's formalize this.
Let X be an n-dimensional random
vector with independent and identically distributed (IID) components
.
It doesn’t matter what their common distribution is as long as its mean
and standard deviation
exist. Let
be the random variable equal to the average of the
.
As a linear polynomial of a random vector,
has mean
and standard deviation
.
Accordingly, the normalized average
 |
[2] |
has mean 0 and standard deviation 1. The central limit
theorem tells us
is approximately standard
normal. Specifically, it states that, for any constant x,
 |
[3] |
where
is the standard normal cumulative distribution function (CDF)
There are many versions of the central limit theorem.
Several of these place additional restrictions on the
but do not require that they be identically distributed. The additional
restrictions vary, but are generally designed to prevent one or a handful
of random variables from dominating the average, which might happen if one
random variable has a standard deviation far greater than the rest.
In Exhibit 2, probability distributions are illustrated
for five independent random variables
.
All five distributions have mean 0 and standard deviation 1 and are
dramatically non-normal. They were selected arbitrarily, but their
normalized average is approximately normal.
|
|
 |
 |
|
In this example, the normalized average of
five independent random variables is still approximately normal
despite the fact that the five random variables have very different
distributions. |
Other versions of the central limit theorem modestly
weaken the independence assumption for the
.
The central limit theorem generalizes to multiple dimensions.
|