Multicollinear

Explained:

multicollinear


 

Ads by Contingency Analysis

A random vector is multicollinear if it is "almost" singular. Let's consider an example from finance.

Suppose we are analyzing the market risk in a natural gas trading portfolio. Random variables represent tomorrow’s values for each price the portfolio is exposed to. The portfolio holds New York Mercantile Exchange (NYMEX) Henry Hub futures out to 24 months, so there are 24 futures prices. It also has forward positions out to 18 months for 30 delivery points, for another 540 prices. In total, our model depends upon a vector of 564 random variables!

Based upon a time series analysis of historical price data, we construct a 564564 covariance matrix for our random vector of prices. Gazing at the 318,096 variances and covariances of our matrix, we wonder: Do we really need all these numbers?

Intuitively, we know that the random variables are interdependent. Prices for 6-month and 7-month Transco Zone 2 delivery are highly correlated. So are 3-month prices for adjacent Transco Zones 1 and 2. Because of such interdependencies, it is conceivable that our random vector is singular, but this is probably not the case. Singularity arises infrequently in applications. A more common situation is “almost” singularity, which is known as multicollinearity.

 
   

We illustrate with two four-dimensional random vectors. Random vector X is singular. Its first three components , and are uncorrelated, each with mean 0 and standard deviation 1. The fourth component equals + + . The covariance matrix for X is

[1]

Random vector Z is multicollinear. Like X, its first three components , and are uncorrelated, each with mean 0 and standard deviation 1. The fourth component equals + + + E, where E is a “noise” random variable that is uncorrelated with , and and has mean 0 and standard deviation .001. Except for the addition of “noise” E, our random vector Z is identical to our random vector X. Its covariance matrix is

[2]

The covariance matrix of X is singular. It has determinant 0. The covariance matrix of Z is not singular, but with a determinant of .000001, it is “almost” singular. The random variable is almost a linear polynomial of , and , but not quite. We added just enough random “noise” to make it linearly independent. We say a random vector is multicollinear if it is “almost” singular in this sense.

Realizations of a multicollinear random vector tend to cluster near a plane within . They don’t all lie in that plane, but they “almost” do. This is illustrated with realizations of a two-dimensional multicollinear random vector Z in Exhibit 1.

Realizations of a
Multicollinear Random Vector

Exhibit 1

Realizations of a multicollinear two-dimensional random vector Z. Component Z2 is “almost” a linear polynomial of component Z1.

We may think of a random vector Z as being “almost” singular if its covariance matrix has a determinant | | close to 0. In practical applications, the magnitude of this determinant will depend upon the units in which components of Z are measured. A more reasonable test for multicollinearity is to consider the determinant of the correlation matrix of Z. This determinant will always be in the interval [0,1]. If it is very close to 0, this is an indication of multicollinearity. Obviously, if it equals 0, Z is singular.

 

Ads by Contingency Analysis

As describe in the article positive definite matrix, the dimensionality of a singular random vector X can be reduced with a simple change of variables. No information is lost, as we only eliminate extraneous random variables. Multicollinearity is more problematic. Reducing the dimensionality of a multicollinear random vector Z requires an approximation that somehow identifies and discards minor randomness that is preventing the covariance matrix from being singular.

This is the situation we face with our natural gas portfolio. We feel confident that the natural gas market can reasonably be modeled with less than 564 random variables, but we can’t arbitrarily discard random variables! If our covariance matrix isn’t singular, how can we replace our 564 random variables with a smaller set that convey essentially the same information? Principal component analysis provides a solution.

Exercises

Which of the following covariance matrices are singular? Which are multicollinear?

[solution]

Sponsored Links

 

Related Internal Links

Cholesky matrix A lower-triangular matrix that acts as a matrix "square root" for a positive definite matrix.

correlation A parameter that indicates the tendency for two random variables to "move together" of "co-vary."

eigenvalue, eigenvector Concepts from linear algebra.

joint normal distribution A multivariate distribution with normal marginal distributions.

linear polynomial of a random vector A random variable or random vector that is defined as a linear polynomial of a random vector.

positive definite matrix A real symmetric matrix, all of whose eigenvalues are real and positive.

principal component analysis A technique for orthogonalizing a random vector.

Sponsored Links

Ads by Contingency Analysis

 

Disclaimer

website: http://www.contingencyanalysis.com
glossary direct link: http://www.riskglossary.com
copyright © Contingency Analysis, 2006