Multicollinear

Explained:

multicollinear


 
   

A random vector is multicollinear if it is "almost" singular. Let's consider an example from finance.

Suppose we are analyzing the market risk in a natural gas trading portfolio. Random variables represent tomorrow's values for each price the portfolio is exposed to. The portfolio holds New York Mercantile Exchange (NYMEX) Henry Hub futures out to 24 months, so there are 24 futures prices. It also has forward positions out to 18 months for 30 delivery points, for another 540 prices. In total, our model depends upon a vector of 564 random variables!

Based upon a time series analysis of historical price data, we construct a 564564 covariance matrix for our random vector of prices. Gazing at the 318,096 variances and covariances of our matrix, we wonder: Do we really need all these numbers?

Intuitively, we know that the random variables are interdependent. Prices for 6-month and 7-month Transco Zone 2 delivery are highly correlated. So are 3-month prices for adjacent Transco Zones 1 and 2. Because of such interdependencies, it is conceivable that our random vector is singular, but this is probably not the case. Singularity arises infrequently in applications. A more common situation is "almost" singularity, which is known as multicollinearity.

 
   

We illustrate with two four-dimensional random vectors. Random vector X is singular. Its first three components , and are uncorrelated, each with mean 0 and standard deviation 1. The fourth component equals + + . The covariance matrix for X is

[1]

Random vector Z is multicollinear. Like X, its first three components , and are uncorrelated, each with mean 0 and standard deviation 1. The fourth component equals + + + E, where E is a "noise" random variable that is uncorrelated with , and and has mean 0 and standard deviation .001. Except for the addition of "noise" E, our random vector Z is identical to our random vector X. Its covariance matrix is

[2]

The covariance matrix of X is singular. It has determinant 0. The covariance matrix of Z is not singular, but with a determinant of .000001, it is "almost" singular. The random variable is almost a linear polynomial of , and , but not quite. We added just enough random "noise" to make it linearly independent. We say a random vector is multicollinear if it is "almost" singular in this sense.

Realizations of a multicollinear random vector tend to cluster near a plane within . They don't all lie in that plane, but they "almost" do. This is illustrated with realizations of a two-dimensional multicollinear random vector Z in Exhibit 1.

Realizations of a
Multicollinear Random Vector

Exhibit 1

Realizations of a multicollinear two-dimensional random vector Z. Component Z2 is "almost" a linear polynomial of component Z1.

We may think of a random vector Z as being "almost" singular if its covariance matrix has a determinant | | close to 0. In practical applications, the magnitude of this determinant will depend upon the units in which components of Z are measured. A more reasonable test for multicollinearity is to consider the determinant of the correlation matrix of Z. This determinant will always be in the interval [0,1]. If it is very close to 0, this is an indication of multicollinearity. Obviously, if it equals 0, Z is singular.

   

As describe in the article positive definite matrix, the dimensionality of a singular random vector X can be reduced with a simple change of variables. No information is lost, as we only eliminate extraneous random variables. Multicollinearity is more problematic. Reducing the dimensionality of a multicollinear random vector Z requires an approximation that somehow identifies and discards minor randomness that is preventing the covariance matrix from being singular.

This is the situation we face with our natural gas portfolio. We feel confident that the natural gas market can reasonably be modeled with less than 564 random variables, but we can't arbitrarily discard random variables! If our covariance matrix isn't singular, how can we replace our 564 random variables with a smaller set that convey essentially the same information? Principal component analysis provides a solution.

Exercises

Which of the following covariance matrices are singular? Which are multicollinear?

[solution]

Sponsored Links

Related Internal Links

Cholesky matrix A lower-triangular matrix that acts as a matrix "square root" for a positive definite matrix.

correlation A parameter that indicates the tendency for two random variables to "move together" of "co-vary."

eigenvalue, eigenvector Concepts from linear algebra.

joint normal distribution A multivariate distribution with normal marginal distributions.

linear polynomial of a random vector A random variable or random vector that is defined as a linear polynomial of a random vector.

positive definite matrix A real symmetric matrix, all of whose eigenvalues are real and positive.

principal component analysis A technique for orthogonalizing a random vector.

Sponsored Links

http://www.riskglossary.com

copyright © Glyn A. Holton, 2006

Although the information in this website has been presented with care and obtained from sources the author believes to be reliable, there is no guarantee that it is accurate. Such information may be incomplete, condensed, outdated or presented with errors. The content of the website is for information purposes only. It is provided gratuitously, so the author shall not be liable under any theory for any damages suffered by any user. The author does not provide investment advice, and this website is not a vehicle for communicating investment advice.