[R] Color points in princomp() plot

Boris Steipe boris.steipe at utoronto.ca
Tue Apr 29 04:36:55 CEST 2014


At first I thought this is easy: use the same strategy you use for all of these types of questions. Plot with type="n" to suppress output of points, then use points() or text() to get the colors and shapes and whatnot you need ...

Turns out biplot() does not use a type argument ...

... and you can't simply fake it by setting cex to a really small value, because biplot() actually creates two plots: one for the scores, then one for the axes. And once you have plotted the axes, the axis limits for the plot have changed, and you can't easily apply points() anymore.

... and you can't easily rescale, because ... well, have a look at the code.

The code, incidentally can be found with getAnywhere(<function.name>).

So what's the solution ...

Since I've wanted this capability for some time, I modified the original biplot() to accept a type parameter type={"t" (Default) | "p" | "n"}. For "t", the function behaves almost exactly as before. For "p" it plots points, and should accept all the usual arguments for that. For "n" it plots nothing except the axes. You can then add the points as desired.

I also added two parameters col.arrows = "red", and col.text = "black" to have extra control.

Here is an example. (Note, you have to load the function, below, first.





library(MASS)
data(crabs)
PRC <- prcomp(crabs[, 4:8])

myBiplot(PRC)
myBiplot(PRC, choices=2:3, cex = 0.7, col.text="#445599") # much as before


# use filled points, color by the value found in column 4 of the data
r <- range(crabs[,4]) 
n <- 15
cv <- cm.colors(n)  
c <- cv[cut(crabs[,4],n)]  
myBiplot(PRC, choices=2:3, type = "p", pch=20, col=c, col.arrows = "#FF6600")


# finally: plot nothing then use points() for detailed control 
myBiplot(PRC, choices=2:3, type = "n")  # no points

# blue/orange crab males/females - as given by rows in the data
points(PRC$x[  1: 50, 2:3], pch=21, bg="#0066FF")
points(PRC$x[ 51:100, 2:3], pch=24, bg="#0066FF")
points(PRC$x[101:150, 2:3], pch=21, bg="#FF6600")
points(PRC$x[151:200, 2:3], pch=24, bg="#FF6600")




Enjoy,
Boris

==============================================================
myBiplot <- function (x, choices = 1L:2L, scale = 1,
                      pc.biplot = FALSE, var.axes = TRUE,
                      type = "t",
                      col,
                      col.arrows = "#FF0000",
                      col.text = "#000000",
                      cex = rep(par("cex"), 2),
                      expand = 1, 
                      xlabs = NULL, ylabs = NULL,
                      xlim = NULL, ylim = NULL, 
                      main = NULL, sub = NULL,
                      xlab = NULL, ylab = NULL, 
                      arrow.len = 0.1,
                      ...
                      )

{
	if (length(choices) != 2L) 
        stop("length of choices must be 2")
    if (!length(scores <- x$x)) 
        stop(gettextf("object '%s' has no scores", deparse(substitute(x))), 
            domain = NA)
    if (is.complex(scores)) 
        stop("biplots are not defined for complex PCA")
        
    lam <- x$sdev[choices]
    n <- NROW(scores)
    lam <- lam * sqrt(n)
    
    if (scale < 0 || scale > 1) 
        warning("'scale' is outside [0, 1]")
    if (scale != 0) 
        lam <- lam^scale
    else lam <- 1
    if (pc.biplot) 
        lam <- lam/sqrt(n)
        
    y <- t(t(x$rotation[, choices]) * lam)
    x <- t(t(scores[, choices])/lam)  # note that from here on
                                      # x is no longer the PC object
                                      # originally pased into the function
    n <- nrow(x)
    p <- nrow(y)
    
    if (missing(xlabs)) {
        xlabs <- dimnames(x)[[1L]]
        if (is.null(xlabs)) 
            xlabs <- 1L:n
    }
    xlabs <- as.character(xlabs)
    dimnames(x) <- list(xlabs, dimnames(x)[[2L]])
    
    if (missing(ylabs)) {
        ylabs <- dimnames(y)[[1L]]
        if (is.null(ylabs)) 
            ylabs <- paste("Var", 1L:p)
    }
    ylabs <- as.character(ylabs)
    dimnames(y) <- list(ylabs, dimnames(y)[[2L]])

    if (length(cex) == 1L) 
        cex <- c(cex, cex)

    unsigned.range <- function(x) c(-abs(min(x, na.rm = TRUE)), 
        abs(max(x, na.rm = TRUE)))
    rangx1 <- unsigned.range(x[, 1L])
    rangx2 <- unsigned.range(x[, 2L])
    rangy1 <- unsigned.range(y[, 1L])
    rangy2 <- unsigned.range(y[, 2L])

    if (missing(xlim) && missing(ylim)) 
        xlim <- ylim <- rangx1 <- rangx2 <- range(rangx1, rangx2)
    else if (missing(xlim)) 
        xlim <- rangx1
    else if (missing(ylim)) 
        ylim <- rangx2

    ratio <- max(rangy1/rangx1, rangy2/rangx2)/expand
    on.exit(par(op))
    op <- par(pty = "s")
    if (!is.null(main)) 
        op <- c(op, par(mar = par("mar") + c(0, 0, 1, 0)))

    # first, plot scores - either normally, or as row labels
    if (type == "p") {
	    plot(x, type = type, xlim = xlim, ylim = ylim, col = col, 
        xlab = xlab, ylab = ylab, sub = sub, main = main, ...)
    }
    else if (type == "t") {
        plot(x, type = "n", xlim = xlim, ylim = ylim, 
             xlab = xlab, ylab = ylab, sub = sub, main = main, ...)
        text(x, xlabs, cex = cex[1L], col = col.text, ...)
    	
    }
    else if (type == "n") {  # plot an empty frame
	    plot(x, type = type, xlim = xlim, ylim = ylim, 
        xlab = xlab, ylab = ylab, sub = sub, main = main, ...)
    }

    par(new = TRUE)
    dev.hold()
    on.exit(dev.flush(), add = TRUE)
    plot(y, axes = FALSE, type = "n", xlim = xlim * ratio, ylim = ylim * 
        ratio, xlab = "", ylab = "", col = col.arrows, ...)
    axis(3, col = col.arrows, ...)
    axis(4, col = col.arrows, ...)
    box(col = "#000000")
    text(y, labels = ylabs, cex = cex[2L], col = col.arrows, ...)
    if (var.axes) 
        arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col.arrows, 
            length = arrow.len)
    # now replot into xlim, ylim scaled by lam, to reset par("usr")
    # to the  correct values needed for subsequent application of points(),
    # text() etc.
    par(new = TRUE)
    dev.hold()
    on.exit(dev.flush(), add = TRUE)
	plot(0, type = "n", xlim = xlim * lam[1], ylim = ylim * lam[2], 
    xlab = '', ylab = '', sub = '', main = '', xaxt='n', yaxt='n', axes=FALSE)

    invisible()

}

==============================================================


On 2014-04-28, at 3:00 PM, Noah Silverman wrote:

> Hi,
> 
> The princomp() function has a nice method to generate a pretty biplot of
> the data.  I have some data where the points are divided into a few
> groups.  My intention is to generate a biplot, using princomp, but color
> the points based on group membership.  The hope is that points will show
> some kind of clustering by group in the biplot.
> 
> However, I can't figure out how to indicate individual point color in a
> biplot.  The only setting I found was to indicate a single color for
> *all* the points.
> 
> Any ideas?
> 
> 
> -- 
> *Noah Silverman, PhD* | UCLA Department of Statistics
> 8117 Math Sciences Building, Los Angeles, CA 90095
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list