|
With principal
component analysis, we transform a random vector Z
with correlated components
into a random vector D with uncorrelated components
.
This is called an orthogonalization of Z.
Principal component analysis can be performed on any
random vector Z whose second moments exist, but it is most
useful with multicollinear
random vectors. Principal component analysis takes the hyperplane in which
realizations of a multicollinear random vector “almost” sit and aligns it
with the coordinate system of
.
The components of D that are perpendicular to the
transformed plane have small, almost trivial
standard deviations.
Discarding these components provides a lower-dimensional approximate
representation for Z. This is illustrated with realizations
of a multicollinear two-dimensional random vector Z in
Exhibit 1:
|
|
 |
|
 |
|
Principal component analysis can be used
to reduce the dimensionality of a multicollinear random vector.
Realizations for a multicollinear two-dimensional random vector
Z are illustrated in the left graph. Principal component
analysis transforms Z into an equivalent
multicollinear random vector D that is aligned with
the coordinate system. Realizations of D are shown in
the middle graph. Discarding the second component
transforms
D into a one-dimensional approximate
representation of the two-dimensional Z. Realizations of this
representation are shown in the right graph. |
Let's start with an intuitive example before presenting the formal
mathematics. Suppose today is June 30, 2000. We consider a random vector
Z whose components represent the simple price returns that
specific European currencies will realize versus the US dollar (USD) over
the upcoming trading day:
 |
[1] |
Exhibit 2 graphs 18 months of daily exchange-rate data
drawn from the period immediately following the launch of the new euro (EUR)
currency. In our data, the EUR weakens following its launch, and the
remaining European currencies—those that did not join the EUR on January
1, 1999—weaken in sympathy. All the currencies track the EUR, but the
GBP does so the least. It is
less correlated with the EUR and loses value more slowly.
|
|
 |
|
 |
|
Historical exchange rates versus the
USD for the period
January 1, 1999, through June 30, 2000. Exchange rates are presented
as USD/unit of currency, so a rising curve indicates a strengthening
currency. Exchange rates are individually scaled so they all fit on
the graph. |
We assume
= 0. Based upon a time series analysis of the historical price
data, we construct a covariance matrix for Z
 |
[2] |
The corresponding correlation matrix is:
 |
[3] |
The correlations are all positive. Several exceed 0.90.
The one between DKK and EUR
exceeds 0.99. The smallest is a respectable 0.45 between GBP and
SEK. With such pronounced
interdependencies between its components, we expect Z to be
multicollinear, and it is.
The correlation matrix has determinant | |
= .0000045.
To define principal
components of Z, we calculate orthonormal
(orthogonal and of unit length)
eigenvectors
of the covariance matrix
of Z. We arrange these as the columns of a matrix:
 |
[4] |
The eigenvectors
are graphed in Exhibit 3. Corresponding
eigenvalues
are also indicated:
|
|
 |
|
 |
|
Eigenvectors of covariance matrix [2].
Corresponding eigenvalues are also indicated. |
The eigenvectors may be thought of as “modes of
fluctuation” of random vector Z. We observed in our
historical data a tendency for the European currencies to move together.
This is reflected in the first eigenvector. It describes a broad move in
all the currencies, with the GBP participating about half as much as the
other currencies. The second eigenvector has the GBP moving in opposition
to the NOK and SEK, with the
CHF moving modestly with the
GBP. The third eigenvector describes the GBP, NOK, and SEK moving together
in opposition to the other currencies. The remaining eigenvectors describe
other “modes of fluctuation.”
If the eigenvectors
are modes of fluctuations of Z, then Z is a
random combination of those modes of fluctuation:
 |
[5] |
The
are the principal components of Z. They are random variables that define
each mode of fluctuation’s random contribution to Z. The
are uncorrelated with
variances equal to the eigenvalues of their corresponding
eigenvectors. The vector D of principal components has mean
= 0 and covariance matrix
 |
[6] |
We have ordered our principal components according to
their variances. From our covariance matrix
,
we see that the first three principal components are more significant than
the rest. The last two principal components,
,
have variances that are less than 1% of the variance of
.
Their contribution to random vector Z is trivial.
We can approximate Z by discarding from [5]
insignificant principal components. The more we discard, the simpler—and
cruder!—will be our approximation. If we want to be aggressive in our
approximation, we can discard the contributions of the last four principal
components, and approximate Z with just the first three. A
more accurate approximation can be obtained by discarding only the last
two. For this example, we pursue the more aggressive course. We
define
 |
[7] |
and approximate Z with
.
Like Z,
has mean vector 0. Its covariance matrix is obtained from [6]
and [7] (see the article
linear
polynomial of a random vector):
 |
[8] |
Comparing this covariance matrix with [2],
you can judge for yourself the quality of the approximation.
Our example informally introduced principal components. Now let’s
formalize them. Consider an n-dimensional random vector Z
with mean
and nonsingular covariance matrix
.
We construct principal components in such a manner that the first accounts
for as much of the variability of Z as possible. The second
accounts for as much of the remaining variability of Z as
possible, and so on.
Specifically, the first principal component
is defined as
 |
[9] |
where
has unit length and is selected to maximize the variance of
.
This is achieved by setting
equal to the normalized first eigenvector of
– the eigenvector with the largest eigenvalue. In this case, the variance
of
equals that eigenvalue,
.
The second principal component
is defined as
 |
[10] |
where
is selected from the set of all n-dimensional unit vectors that are
orthogonal to
in such a manner as to maximize the variance of
.
This is achieved by setting
equal to the normalized second eigenvector of
– the eigenvector with the second largest eigenvalue. The variance of
equals that eigenvalue,
.
Proceeding in this manner, we define the remaining
principal components. There will be m principal components
,
each one corresponding to a normalized eigenvector
of
.
We can represent
 |
[11] |
The vector of principal components D has
mean
= 0 and covariance matrix
 |
[12] |
If
is nonsingular, the number m of principal components equals the
dimensionality n of Z. If
is singular, some of its eigenvalues will equal 0, and the number m
of principal components will be less than the dimensionality n of
Z. In this case, [12] will have reduced the dimensionality
of the singular Z in exactly the same manner as that
described in the article
positive definite,
positive semidefinite covariance matrix.
Principal component analysis is best performed on random variables whose
standard deviations are reflective of their relative significance for an
application. This is because principal component analysis depends upon
both the correlations between random variables and the standard deviations
of those random variables. If we were to change standard deviations of a
set of random variables but leave their correlations the same, this would
change their principal components. In a sense, principal component
analysis uses standard deviation as a
metric of
significance. If one random variable has a standard deviation that far
exceeds the rest, that random variable will dominate the first
eigenvector.
Unfortunately, there may be no correspondence between a
random variable’s standard deviation and its significance. Standard
deviations depend upon the units in which a random variable is measured.
Suppose a random variable reflects the time it takes for some event to
occur, and if the random variable is measured in days, it has a standard
deviation of 13.5. If the standard deviation is measured in hours, it is
324. Measured in minutes, it becomes 19,440. Certainly, the 19,440
standard deviation is no more significant than the 13.5 standard
deviation, but principal component analysis will treat it as more
significant!
If we use principal components only to orthogonalize a
random vector, this will not be a problem. No information is lost. It will
be a problem if principal components are discarded to form an
approximation. In this case, information is lost. Before we discard
principal components that appear “insignificant,” we should make sure that
they truly are insignificant.
There are various solutions to this problem. We might
insist that all random variables be measured in the same units, but this
is not always feasible. If one random variable represents temperature and
another represents volume, these are fundamentally different quantities.
Also, identical units do not necessarily correspond to identical
significance. Suppose we are analyzing blood samples for lead, and we have
a random variable for each component of the blood. All components are
measured in parts per million (ppm). Measured in ppm, the standard
deviation of lead will be trivial compared to standard deviations for
other constituents of the blood. Yet, the lead component is the most
important random variable!
Alternatively, we might apply principal component analysis
to normalized random variables obtained by dividing each random variable
by its standard deviation:
 |
[13] |
With this approach, we effectively apply principal
component analysis to the random variables’ correlation matrix. This
represents a different weighting from that obtained by measuring all
random variables in identical units—but not necessarily a better one.
Any solution may be reasonable in certain contexts and
unreasonable in others. Each one weights the random variables in some
manner. There is no objective way to assign weights, just as there is no
objective way to assign “significance.” Weights and “significance” can and
should vary from one application to another. When we use principal
components to reduce the dimensionality of a random vector, there is
subjectivity in the process.
|
|
 |
|
Consider a random vector
Z with mean and covariance matrix
|
 |
[e1] |
a. Calculate the determinant of the corresponding
correlation matrix.
b. Is Z singular, multicollinear, or neither
of these?
c. Calculate the eigenvalues and eigenvectors of
.
d. Represent Z in terms of its principal
components as in [11].
e. What is the covariance matrix
of the vector of principal components D?
f. Construct an approximation
for Z based on the first two principal
components of Z.
g. Construct the covariance matrix of
.
Compare your result with the covariance matrix of Z.
[solution] |
|
|
|
 |
|
|
|
|
 |
 |
Ads by Contingency Analysis
|
|
|
 |
|
|
|
|
|