[Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Thu Mar 24 19:58:24 CET 2016
Martin, I fully agree. This becomes an issue when you have big matrices.
(Note that there are awesome methods for actually only computing a small
number of PCs (unlike your code which uses svn which gets all of them);
these are available in various CRAN packages).
Best,
Kasper
On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:
> Following from the R-help thread of March 22 on "Memory usage in prcomp",
>
> I've started looking into adding an optional 'rank.' argument
> to prcomp allowing to more efficiently get only a few PCs
> instead of the full p PCs, say when p = 1000 and you know you
> only want 5 PCs.
>
> (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html
>
> As it was mentioned, we already have an optional 'tol' argument
> which allows *not* to choose all PCs.
>
> When I do that,
> say
>
> C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
> all.equal(S, crossprod(C))
> set.seed(17)
> X <- matrix(rnorm(32000), 1000, 32)
> Z <- X %*% C ## ==> cov(Z) ~= C'C = S
> all.equal(cov(Z), S, tol = 0.08)
> pZ <- prcomp(Z, tol = 0.1)
> summary(pZ) # only ~14 PCs (out of 32)
>
> I get for the last line, the summary.prcomp(.) call :
>
> > summary(pZ) # only ~14 PCs (out of 32)
> Importance of components:
> PC1 PC2 PC3 PC4 PC5 PC6
> PC7 PC8
> Standard deviation 3.6415 2.7178 1.8447 1.3943 1.10207 0.90922 0.76951
> 0.67490
> Proportion of Variance 0.4352 0.2424 0.1117 0.0638 0.03986 0.02713 0.01943
> 0.01495
> Cumulative Proportion 0.4352 0.6775 0.7892 0.8530 0.89288 0.92001 0.93944
> 0.95439
> PC9 PC10 PC11 PC12 PC13 PC14
> Standard deviation 0.60833 0.51638 0.49048 0.44452 0.40326 0.3904
> Proportion of Variance 0.01214 0.00875 0.00789 0.00648 0.00534 0.0050
> Cumulative Proportion 0.96653 0.97528 0.98318 0.98966 0.99500 1.0000
> >
>
> which computes the *proportions* as if there were only 14 PCs in
> total (but there were 32 originally).
>
> I would think that the summary should or could in addition show
> the usual "proportion of variance explained" like result which
> does involve all 32 variances or std.dev.s ... which are
> returned from the svd() anyway, even in the case when I use my
> new 'rank.' argument which only returns a "few" PCs instead of
> all.
>
> Would you think the current summary() output is good enough or
> rather misleading?
>
> I think I would want to see (possibly in addition) proportions
> with respect to the full variance and not just to the variance
> of those few components selected.
>
> Opinions?
>
> Martin Maechler
> ETH Zurich
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list