[R] PCA in Q- and R-modes
agingadvice at gmail.com
Wed Jan 18 19:35:31 CET 2017
I'm working with proteomic data, helping a student who knows biology and
has done analysis in R without understanding it in depth.
We have 3000 protein levels for 6 ages. I can treat this as 6 vectors in
3000-dimensional space, diagonalize a 6x6 covariance matrix and find 5
principal components, one zero eigenvalue. My student has worked with R in
"Q mode" and he enters the transposed matrix as 3000 vectors in
6-dimensional space. In just a few seconds, R diagonalizes a 3000x3000
matrix! I can't imagine what that means, to diagonalize a 3000x3000
matrix. But, of course, there are only 5 degrees of freedom in the data,
so only 5 of the eigenvalues are non-zero, and the other 2995 vectors are
Questions: a) Is there a relationship between the principal components
of the 3000*6 matrix and the principal components of the transposed 6*3000
b) Is there a way to find the 5 meaningful
eigenvectors without carrying the baggage of diagonalizing the huge
c) The big question is which version to analyze and
publish? My student tells me the transposed matrix is the common
procedure. The two yield very different-looking plots.
Thanks for your help.
- Josh Mitteldorf
[[alternative HTML version deleted]]
More information about the R-help