[R] Note on PCA (not directly with R)
Nikos Alexandris
nikos.alexandris at felis.uni-freiburg.de
Thu Jul 1 04:51:03 CEST 2010
Christofer Bogaso wrote:
> Dear all, I am looking for some interactive study materials on Principal
> component analysis. Basically I would like to know what we are actually
> doing with PCA?
Having in mind the eigenvalue decomposition and a bivariate data set, the sum-
it-all in a few sentences I think is:
- PCA rotates (and scales) the data set in such a way that aligns the
thransformed axes (the principal components) with the direction(s) of maximum
variance
- eigen values are proportional to the lentgh of the axes of variation
- eigen (or characteristic) vectors define the rotation
> What is happening within the dataset at the time of doing
> PCA.
The algorithm (classically):
- mean-centers the data matrix
- calculates the covariance matrix (non-standardised PCA) or the correlation
matrix (standartised PCA, a step also known as scaling)
- calculates the the eigenvalue decomposition (EVD) (the eigenvectors and
eigenvalues) of a data variance-covariance (non-standardised) or the
correlation matrix (standardised)
- sorts the variances (i.e. the eigenvalues) in decreasing order and finally
projects the original dataset signals into what is named Principal Components
or scores, by multiplying them with the eigenvectors which act as weighting
coefficients.
The algorithm does actually three (or more?) things:
- minimises the mean square error of approximating the original data set,
- keeps the maximum possible variance(s) of the original data set,
- gives decorrelated variables
> Probably a 3-dimensional interactive explanation would be best for me.
> I have gone through some online materials specially Wikipedia etc, however
> what I need a "movable explanation" to understand that.
>
> Any suggestion please?
For what is worth, I think a 2-dimensional example is better to start with.
You can have a look at the plotpc() package. It really is educational.
Good luck, Nikos
More information about the R-help
mailing list