|
A random vector is
multicollinear if it is "almost"
singular. Let's consider an
example from finance.
Suppose we are analyzing the
market risk in a natural gas
trading portfolio. Random variables represent tomorrow’s values for each
price the portfolio is exposed to. The portfolio holds New York Mercantile
Exchange (NYMEX) Henry Hub futures out to 24 months, so there are 24
futures prices. It also has
forward positions out to 18 months
for 30 delivery points, for another 540 prices. In total, our model
depends upon a vector of 564 random variables!
Based upon a
time series
analysis of historical price data, we construct a 564 564
covariance matrix for our random
vector of prices. Gazing at the 318,096
variances and
covariances of our matrix, we
wonder: Do we really need all these numbers?
Intuitively, we know that the random variables are
interdependent. Prices for 6-month and 7-month Transco Zone 2 delivery are
highly correlated. So are 3-month prices for adjacent Transco Zones 1 and
2. Because of such interdependencies, it is conceivable that our random
vector is singular,
but this is probably not the case. Singularity arises infrequently in
applications. A more common situation is “almost” singularity, which is
known as multicollinearity.
We illustrate with two four-dimensional random vectors.
Random vector X is singular. Its first three components
,
and
are uncorrelated, each with mean 0 and
standard deviation 1. The
fourth component
equals
+
+
.
The covariance matrix for X is
 |
[1] |
Random vector Z is multicollinear. Like
X, its first three components
,
and
are uncorrelated, each with mean 0 and standard deviation 1. The fourth
component
equals
+
+
+ E, where E is a “noise” random variable that is
uncorrelated with
,
and
and has mean 0 and standard deviation .001. Except for the addition of
“noise” E, our random vector Z is identical to our
random vector X. Its covariance matrix is
 |
[2] |
The covariance matrix of X is singular. It
has determinant 0. The covariance matrix of Z is not
singular, but with a determinant of .000001, it is “almost” singular. The
random variable
is almost a linear polynomial of
,
and
,
but not quite. We added just enough random “noise” to make it linearly
independent. We say a random vector is multicollinear if it is “almost”
singular in this sense.
Realizations of a multicollinear random vector tend to
cluster near a plane within
.
They don’t all lie in that plane, but they “almost” do. This is
illustrated with realizations of a two-dimensional multicollinear random
vector Z in Exhibit 1.
|
|
 |
 |
|
Realizations of
a multicollinear two-dimensional random vector Z.
Component Z2 is “almost” a linear polynomial of
component Z1. |
We may think of a random vector Z as being
“almost” singular if its covariance matrix has a determinant |
| close to 0. In practical applications, the magnitude of this determinant
will depend upon the units in which components of Z are
measured. A more reasonable test for multicollinearity is to consider the
determinant of the correlation matrix of Z. This determinant
will always be in the interval [0,1]. If it is very close to 0, this is an
indication of multicollinearity. Obviously, if it equals 0, Z
is singular.
As describe in the article
positive definite
matrix, the dimensionality of a singular random vector X
can be reduced with a simple change of variables. No information is lost,
as we only eliminate extraneous random variables. Multicollinearity is
more problematic. Reducing the dimensionality of a multicollinear random
vector Z requires an approximation that somehow identifies
and discards minor randomness that is preventing the covariance matrix
from being singular.
This is the situation we face with our natural gas
portfolio. We feel confident that the natural gas market can reasonably be
modeled with less than 564 random variables, but we can’t arbitrarily
discard random variables! If our covariance matrix isn’t singular, how can
we replace our 564 random variables with a smaller set that convey
essentially the same information?
Principal component analysis provides a
solution.
|