Principal Component Analysis

Explained:

principal component

principal component analysis

 
   

With principal component analysis, we transform a random vector Z with correlated components into a random vector D with uncorrelated components . This is called an orthogonalization of Z.

Principal component analysis can be performed on any random vector Z whose second moments exist, but it is most useful with multicollinear random vectors. Principal component analysis takes the hyperplane in which realizations of a multicollinear random vector “almost” sit and aligns it with the coordinate system of . The components of D that are perpendicular to the transformed plane have small, almost trivial standard deviations. Discarding these components provides a lower-dimensional approximate representation for Z. This is illustrated with realizations of a multicollinear two-dimensional random vector Z in Exhibit 1:

Dimension Reduction with Principal Component Analysis
Exhibit 1

Principal component analysis can be used to reduce the dimensionality of a multicollinear random vector. Realizations for a multicollinear two-dimensional random vector Z are illustrated in the left graph. Principal component analysis transforms Z into an equivalent multicollinear random vector D that is aligned with the coordinate system. Realizations of D are shown in the middle graph. Discarding the second component transforms D into a one-dimensional approximate representation of the two-dimensional Z. Realizations of this representation are shown in the right graph.

Example: European Currencies
Let's start with an intuitive example before presenting the formal mathematics. Suppose today is June 30, 2000. We consider a random vector Z whose components represent the simple price returns that specific European currencies will realize versus the US dollar (USD) over the upcoming trading day:

[1]

Exhibit 2 graphs 18 months of daily exchange-rate data drawn from the period immediately following the launch of the new euro (EUR) currency. In our data, the EUR weakens following its launch, and the remaining European currencies—those that did not join the EUR on January 1, 1999—weaken in sympathy. All the currencies track the EUR, but the GBP does so the least. It is less correlated with the EUR and loses value more slowly.

European Exchange Rates:
January 1999 Through June 2000

Exhibit 2

Historical exchange rates versus the USD for the period January 1, 1999, through June 30, 2000. Exchange rates are presented as USD/unit of currency, so a rising curve indicates a strengthening currency. Exchange rates are individually scaled so they all fit on the graph.

We assume = 0. Based upon a time series analysis of the historical price data, we construct a covariance matrix for Z

[2]

The corresponding correlation matrix is:

[3]

The correlations are all positive. Several exceed 0.90. The one between DKK and EUR exceeds 0.99. The smallest is a respectable 0.45 between GBP and SEK. With such pronounced interdependencies between its components, we expect Z to be multicollinear, and it is. The correlation matrix has determinant || = .0000045.

To define principal components of Z, we calculate orthonormal (orthogonal and of unit length) eigenvectors of the covariance matrix of Z. We arrange these as the columns of a matrix:

[4]

The eigenvectors are graphed in Exhibit 3. Corresponding eigenvalues are also indicated:

Example: Eigenvectors for Currency Returns
Exhibit 3

Eigenvectors of covariance matrix [2]. Corresponding eigenvalues are also indicated.

 
   

The eigenvectors may be thought of as “modes of fluctuation” of random vector Z. We observed in our historical data a tendency for the European currencies to move together. This is reflected in the first eigenvector. It describes a broad move in all the currencies, with the GBP participating about half as much as the other currencies. The second eigenvector has the GBP moving in opposition to the NOK and SEK, with the CHF moving modestly with the GBP. The third eigenvector describes the GBP, NOK, and SEK moving together in opposition to the other currencies. The remaining eigenvectors describe other “modes of fluctuation.”

If the eigenvectors are modes of fluctuations of Z, then Z is a random combination of those modes of fluctuation:

[5]

The are the principal components of Z. They are random variables that define each mode of fluctuation’s random contribution to Z. The are uncorrelated with variances equal to the eigenvalues of their corresponding eigenvectors. The vector D of principal components has mean = 0 and covariance matrix

[6]

We have ordered our principal components according to their variances. From our covariance matrix , we see that the first three principal components are more significant than the rest. The last two principal components, , have variances that are less than 1% of the variance of . Their contribution to random vector Z is trivial.

We can approximate Z by discarding from [5] insignificant principal components. The more we discard, the simpler—and cruder!—will be our approximation. If we want to be aggressive in our approximation, we can discard the contributions of the last four principal components, and approximate Z with just the first three. A more accurate approximation can be obtained by discarding only the last two. For this example, we pursue the more aggressive course. We define

[7]

and approximate Z with . Like Z, has mean vector 0. Its covariance matrix is obtained from [6] and [7] (see the article linear polynomial of a random vector):

[8]

Comparing this covariance matrix with [2], you can judge for yourself the quality of the approximation.

Principal Components
Our example informally introduced principal components. Now let’s formalize them. Consider an n-dimensional random vector Z with mean and nonsingular covariance matrix . We construct principal components in such a manner that the first accounts for as much of the variability of Z as possible. The second accounts for as much of the remaining variability of Z as possible, and so on.

Specifically, the first principal component is defined as

[9]

where has unit length and is selected to maximize the variance of . This is achieved by setting equal to the normalized first eigenvector of – the eigenvector with the largest eigenvalue. In this case, the variance of equals that eigenvalue, .

The second principal component is defined as

[10]

where is selected from the set of all n-dimensional unit vectors that are orthogonal to in such a manner as to maximize the variance of . This is achieved by setting equal to the normalized second eigenvector of – the eigenvector with the second largest eigenvalue. The variance of equals that eigenvalue, .

Proceeding in this manner, we define the remaining principal components. There will be m principal components , each one corresponding to a normalized eigenvector of . We can represent

[11]

The vector of principal components D has mean = 0 and covariance matrix

[12]

If is nonsingular, the number m of principal components equals the dimensionality n of Z. If is singular, some of its eigenvalues will equal 0, and the number m of principal components will be less than the dimensionality n of Z. In this case, [12] will have reduced the dimensionality of the singular Z in exactly the same manner as that described in the article positive definite, positive semidefinite covariance matrix.

Choice of Weights
Principal component analysis is best performed on random variables whose standard deviations are reflective of their relative significance for an application. This is because principal component analysis depends upon both the correlations between random variables and the standard deviations of those random variables. If we were to change standard deviations of a set of random variables but leave their correlations the same, this would change their principal components. In a sense, principal component analysis uses standard deviation as a metric of significance. If one random variable has a standard deviation that far exceeds the rest, that random variable will dominate the first eigenvector.

Ads by Contingency Analysis

Advertise on this site

 

Ads by Contingency Analysis

Unfortunately, there may be no correspondence between a random variable’s standard deviation and its significance. Standard deviations depend upon the units in which a random variable is measured. Suppose a random variable reflects the time it takes for some event to occur, and if the random variable is measured in days, it has a standard deviation of 13.5. If the standard deviation is measured in hours, it is 324. Measured in minutes, it becomes 19,440. Certainly, the 19,440 standard deviation is no more significant than the 13.5 standard deviation, but principal component analysis will treat it as more significant!

If we use principal components only to orthogonalize a random vector, this will not be a problem. No information is lost. It will be a problem if principal components are discarded to form an approximation. In this case, information is lost. Before we discard principal components that appear “insignificant,” we should make sure that they truly are insignificant.

There are various solutions to this problem. We might insist that all random variables be measured in the same units, but this is not always feasible. If one random variable represents temperature and another represents volume, these are fundamentally different quantities. Also, identical units do not necessarily correspond to identical significance. Suppose we are analyzing blood samples for lead, and we have a random variable for each component of the blood. All components are measured in parts per million (ppm). Measured in ppm, the standard deviation of lead will be trivial compared to standard deviations for other constituents of the blood. Yet, the lead component is the most important random variable!

Alternatively, we might apply principal component analysis to normalized random variables obtained by dividing each random variable by its standard deviation:

[13]
 
   

With this approach, we effectively apply principal component analysis to the random variables’ correlation matrix. This represents a different weighting from that obtained by measuring all random variables in identical units—but not necessarily a better one.

Any solution may be reasonable in certain contexts and unreasonable in others. Each one weights the random variables in some manner. There is no objective way to assign weights, just as there is no objective way to assign “significance.” Weights and “significance” can and should vary from one application to another. When we use principal components to reduce the dimensionality of a random vector, there is subjectivity in the process.

Exercises

Consider a random vector Z with mean and covariance matrix

[e1]

a. Calculate the determinant of the corresponding correlation matrix.

b. Is Z singular, multicollinear, or neither of these?

c. Calculate the eigenvalues and eigenvectors of .

d. Represent Z in terms of its principal components as in [11].

e. What is the covariance matrix of the vector of principal components D?

f. Construct an approximation for Z based on the first two principal components of Z.

g. Construct the covariance matrix of . Compare your result with the covariance matrix of Z.

[solution]

Sponsored Links

Ads by Contingency Analysis

 

Related Internal Links

Cholesky matrix A lower-triangular matrix that acts as a matrix "square root" for a positive definite matrix.

correlation A parameter that indicates the tendency for two random variables to "move together" of "co-vary."

eigenvalue, eigenvector Concepts from linear algebra.

joint normal distribution A multivariate distribution with normal marginal distributions.

linear polynomial of a random vector A random variable or random vector that is defined as a linear polynomial of a random vector.

multicollinear A covariance matrix is muticollinear if it is "almost" singular.

positive definite matrix A real symmetric matrix, all of whose eigenvalues are real and positive.

remapping In value-at-risk, the approximation of one risk vector with another.

Sponsored Links

Ads by Contingency Analysis

 

Related Forum Discussions

PCA with correlation matrix 06 Aug 2003
Principal components of the yield curve.

Disclaimer

website: http://www.contingencyanalysis.com
glossary direct link: http://www.riskglossary.com
copyright © Contingency Analysis, 2006