[R] biplot and prcomp

Jari Oksanen jarioksa at sun3.oulu.fi
Fri May 9 15:20:41 CEST 2003


On Fri, 2003-05-09 at 15:27, Hector L. Ayala wrote:
> R users
>    As was mentioned before, the princomp() function in R 1.7.0 will 
> complain if the data set has more columns than rows.  The alternative 
> functions to generate a PCA analysis that will not complain about such type 
> of data set are prcomp() or pca() (multiv library).  Although the results 
> of both of them are almost identical, the outputs are not compatible with 
> the biplot() function from mva.  Does anybody know if there is an 
> alternative way to generate the biplot??  I can generate a two dimensional 
> graph with the sample scores by plotting the values from the object where 
> the results are, but how do I plot the arrows that represent the loadings???
> 
biplot function (1) plots row and column scores in the same plot, (2)
scales both row and column scores in some peculiar ways, and (3) uses
different axis systems for these column and row scores. In principle you
can do all this using plotting primitives such as plot, text (or points)
and arrows. As an alternative, you can edit biplot.princomp so that it
uses prcomp objects instead of princomp objects: chanse x$scores to x$x,
x$loadings to x$rotation and x$n.obs to nrow(x$x). I think that prcomp
and princomp define eigenvalues (x$sdev) differently: one uses biased
another unbiased variances. I don't remember which one uses which. You
can either ignore this or adjust x$sdev like needed. As a net result the
following minimal edition may do what you need:

biplot.prcomp <- 
function (x, choices = 1:2, scale = 1, pc.biplot = FALSE, ...)
{
    if (length(choices) != 2)
        stop("length of choices must be 2")
    if (!length(scores <- x$x))
        stop(paste("object", deparse(substitute(x)), "has no scores"))
    lam <- x$sdev[choices]
    if (is.null(n <- nrow(x$x)))
        n <- 1
    lam <- lam * sqrt(n)
    if (scale < 0 || scale > 1)
        warning("scale is outside [0, 1]")
    if (scale != 0)
        lam <- lam^scale
    else lam <- 1
    if (pc.biplot)
        lam <- lam/sqrt(n)
    biplot.default(t(t(scores[, choices])/lam), t(t(x$rotation[,
        choices]) * lam), ...)
    invisible()
}

A more flexible alternative that automatically adapts to changes in
biplot.princomp would be:

> biplot.prcomp <- 
function(x, ...)
{
   tmp <- list()
   tmp$scores <- x$x
   tmp$loadings <- x$rotation
   tmp$sdev <- x$sdev
   tmp$n.obs <- nrow(x$scores)
   class(tmp) <- "princomp"
   biplot(tmp, ...)
   invisible()
}

I haven't tested this, but it draws a graph (correct or incorrect, I
don't know).

As an alternative, you may consider using function rda in the vegan
package. This is for Redundancy Analysis sensu ter Braak, but without
constraints it will be yet another PCA (and it uses svd like prcomp).
This has a plot function which will put both row and column results into
same graph. The scaling is very peculiar, but as the author (Prof.
Ripley?) writes in ?biplot.princomp "There is considerable confusion
over the precise definitions", so these definitions might be tolerated
as well. At least the rda plots are scaled so that the results are a
graphical approximation to the data, i.e., cross product of used scores
will give a least squares approximation of the data. Difference to
biplot.princomp scaling is that same scale is used for both sets of
scores instead of using separate axis systems. Moreover, no arrows are
used in graphs but both row and column scores are defined by points or
text strings.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/




More information about the R-help mailing list