[Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'

Martin Maechler maechler at stat.math.ethz.ch
Thu Mar 24 18:09:27 CET 2016


Following from the R-help thread of March 22 on "Memory usage in prcomp",

I've started looking into adding an optional   'rank.'  argument
to prcomp  allowing to more efficiently get only a few PCs
instead of the full p PCs, say when p = 1000 and you know you
only want 5 PCs.

 (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html

As it was mentioned, we already have an optional 'tol' argument
which allows *not* to choose all PCs.

When I do that,
say

     C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
     all.equal(S, crossprod(C))
     set.seed(17)
     X <- matrix(rnorm(32000), 1000, 32)
     Z <- X %*% C  ## ==>  cov(Z) ~=  C'C = S
     all.equal(cov(Z), S, tol = 0.08)
     pZ <- prcomp(Z, tol = 0.1)
     summary(pZ) # only ~14 PCs (out of 32)
     
I get for the last line, the   summary.prcomp(.) call :

> summary(pZ) # only ~14 PCs (out of 32)
Importance of components:
                          PC1    PC2    PC3    PC4     PC5     PC6     PC7     PC8
Standard deviation     3.6415 2.7178 1.8447 1.3943 1.10207 0.90922 0.76951 0.67490
Proportion of Variance 0.4352 0.2424 0.1117 0.0638 0.03986 0.02713 0.01943 0.01495
Cumulative Proportion  0.4352 0.6775 0.7892 0.8530 0.89288 0.92001 0.93944 0.95439
                           PC9    PC10    PC11    PC12    PC13   PC14
Standard deviation     0.60833 0.51638 0.49048 0.44452 0.40326 0.3904
Proportion of Variance 0.01214 0.00875 0.00789 0.00648 0.00534 0.0050
Cumulative Proportion  0.96653 0.97528 0.98318 0.98966 0.99500 1.0000
>

which computes the *proportions* as if there were only 14 PCs in
total (but there were 32 originally).

I would think that the summary should  or could in addition show
the usual  "proportion of variance explained"  like result which
does involve all 32  variances or std.dev.s ... which are
returned from the svd() anyway, even in the case when I use my
new 'rank.' argument which only returns a "few" PCs instead of
all.

Would you think the current  summary() output is good enough or
rather misleading?

I think I would want to see (possibly in addition) proportions
with respect to the full variance and not just to the variance
of those few components selected.

Opinions?

Martin Maechler
ETH Zurich



More information about the R-devel mailing list