Central Limit Theorem

Explained:

central limit theorem

 

Ads by Contingency Analysis  

Advertise on this site

The normal distribution is useful for modeling various random quantities, such as people’s heights, asset returns, and test scores. This is no coincidence. If a process is additive—reflecting the combined influence of multiple random occurrences—the result is likely to be approximately normal. This famous result is known as the the central limit theorem.

In a nutshell, the central limit theorem states that a sum of random variables will have a distribution that is approximately normal. The means and standard deviations of the random variables must exist, and other modest conditions must also be met. In practical applications, those modest conditions are met more often than not.

For an example, consider several independent U(–1, 1) random variables (see the article uniform distribution for an explanation of this notation). Let be the random variable equal to the average of the first n of the :

[1]

Exhibit 1 indicates the probability density function (PDF) of when n has values 1, 2, 3, 4 and then 5.

Example
Exhibit 1

A random variable that equals an average of independent U(–1,1) random variables becomes more like a normal random variable as the number of
U
(–1,1) random variables in the average increases. This is illustrated with a progression of PDF's. Note that images have been scaled so all have similar widths.

 
   

The first image in Exhibit 1 is simply the PDF of a U(–1, 1) random variable. The second is the PDF of a random variable that is an average of two independent U(–1, 1) random variables. That PDF has a triangular shape. Next, with an average of three independent U(–1, 1) random variables, the PDF takes on a bell shape. As n continues to grow, the shape of the PDF becomes increasingly like that of the normal distribution. This graphically illustrates the central limit theorem. Let's formalize this.

Let X be an n-dimensional random vector with independent and identically distributed (IID) components . It doesn’t matter what their common distribution is as long as its mean and standard deviation exist. Let be the random variable equal to the average of the . As a linear polynomial of a random vector, has mean and standard deviation . Accordingly, the normalized average

[2]

has mean 0 and standard deviation 1. The central limit theorem tells us is approximately standard normal. Specifically, it states that, for any constant x,

[3]
 
   

where is the standard normal cumulative distribution function (CDF)

There are many versions of the central limit theorem. Several of these place additional restrictions on the but do not require that they be identically distributed. The additional restrictions vary, but are generally designed to prevent one or a handful of random variables from dominating the average, which might happen if one random variable has a standard deviation far greater than the rest.

In Exhibit 2, probability distributions are illustrated for five independent random variables . All five distributions have mean 0 and standard deviation 1 and are dramatically non-normal. They were selected arbitrarily, but their normalized average is approximately normal.

Example
Exhibit 2

In this example, the normalized average of five independent random variables is still approximately normal despite the fact that the five random variables have very different distributions.

Other versions of the central limit theorem modestly weaken the independence assumption for the . The central limit theorem generalizes to multiple dimensions.

Sponsored Links

Ads by Contingency Analysis

 

Related Internal Links

Cornish-Fisher expansion A formula for approximating quintiles of a random variable based only on its first few cumulants.

joint normal distribution A multivariate distribution with normal marginal distributions.

kurtosis A parameter describing the peakedness and tails of a distribution.

linear polynomial of a random vector A random variable or random vector that is defined as a linear polynomial of a random vector.

normal distribution Perhaps the most important probability distribution for probability and statistics.

stable Paretian distribution A non-normal stable distribution.

standard deviation A parameter describing the dispersion of a distribution.

uniform distribution A continuous probability distribution that has constant probability on a finite interval.

Sponsored Links

Ads by Contingency Analysis

 

Related Books

Salsburg (2001) is a wonderful history of probability and statistics that describes the origins of the central limit theorem. Degroot and Schervish (2002) is a standard university probability text. Holton (2003) describes the use of the central limit theorem in measuring portfolio market risk.

Lady Tasting Tea

David Salsburg

quality

 

technical  

2001

 

Probability and Statistics

Morris H. Degroot and Mark J. Schervish

quality

 

technical  

2002

 

Value-at-Risk: Theory and Practice

Glyn Holton

quality

 

technical  

2003

 

Sponsored Links

 

Disclaimer

website: http://www.contingencyanalysis.com
glossary direct link: http://www.riskglossary.com
copyright © Contingency Analysis, 2005