[R] prcomp - surprising structure
Hermann Norpois
hnorpois at gmail.com
Thu Oct 3 11:41:16 CEST 2013
Hello,
I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).
Suggestion:
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a normalisation procedure (see below
for details). Is that okay? Or does prcomp repeat substraction steps?
Originally my approach was driven by the idea to compute a covariation
matrix followed by the use of eigen, but the covariation matrix was to huge
to handle. So I switched to prcomp.
As I guess that the "columns" in my plots reflect some artefact production
I hope to get some help. For the case that my use of prcomp was not okay,
could you please give me instructions how to use it - including with the
normalisation procedure that I need to include before doing a pca.
Thanks
Hermann
#
# mat: matrix with genotypes coded as 0,1 and 2 (columns); IDs
(observations) as rows.
#
prcomp.snp <- function (mat)
{
m <- ncol (mat)
n <- nrow (mat)
snp.namen <- colnames (mat)
for (i in 1:m)
{
# snps in columns
ui <- mat[,i]
n <- length (which (!is.na(ui)))
# see methods Price et al. as correction
pi <- (1+ sum(ui, na.rm=TRUE))/(2+2*n)
# substract mean
ui <- ui - mean (ui, na.rm=TRUE)
# NAs set to zero
ui[is.na(ui)] <- 0
# normalisation of the genotype for each ID
important normalisation step
ui <- ui/ (sqrt (pi*(1-pi)))
# fill matrix with ui
mat[,i] <- ui
}
mat <- prcomp (mat)
return (mat)
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rotplot.png
Type: image/png
Size: 17486 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131003/04b3d9e5/attachment.png>
More information about the R-help
mailing list