[Rd] areaplot

Mon Jun 14 23:50:11 CEST 2010

I would like to propose adding a new plot function to the 'graphics' 
package. The new function is called areaplot() and I have implemented it 
as a generic function that supports a variety of data classes.

An area plot consists of a simple line, like plot(x, y, type="l"), except 
the area between 0 and the line is a filled polygon. Areas can be stacked 
on top of each other, like barplot(matrix(3:1)), and the data can be 
plotted as proportions, so stacked areas equal 1.

Area plots are commonly found in the scientific literature. Currently, 
drawing an area plot in R using polygon() is a hassle at best, and drawing 
a stacked area plot can be a major task for beginning users.

If you source the attached areaplot.R file, and read the help page and 
examples, you will see that I made an effort to implement the following 
features:

- the same look as standard R plots, including default xlab and ylab

- the same graphical args as standard R plots, including add=TRUE

- the same robustness to NA values as standard R plots

- a generic function that supports vectors, tables, matrices, data frames, 
lists, time-series objects, and formulas

On a technical note, I included the support for different data classes 
inside areaplot.default(), instead of implementing separate areaplot.foo() 
functions. It would be trivial to do move the if-clauses to separate 
functions, but I wasn't sure if that would improve the code or not, since 
the necessary data manipulations are minimal. It was only areaplot.formula 
that should obviously be a separate function.

Comparable functions are found in some user packages, highlighting the 
general need of an area plot function, but the existing functions do not 
follow the R plotting standards, nor do they support such a variety of 
data classes, or provide stacked and proportional area plots.

I'm looking forward to your feedback,

Arni
-------------- next part --------------
areaplot <-

function(x, ...)

{

  UseMethod("areaplot")

}

areaplot.default <-

function(x, y=NULL, prop=FALSE, add=FALSE, xlab=NULL, ylab=NULL, col=NULL, ...)

{

  if(is.ts(x)) # ts/mts

  {

    if(is.null(ylab))

      ylab <- deparse(substitute(x))

    x <- data.frame(Time=time(x), x)

  }

  if(is.table(x))  # table

  {

    if(is.null(ylab))

      ylab <- deparse(substitute(x))

    if(length(dim(x)) == 1)

      x <- t(t(unclass(x)))

    else

      x <- unclass(x)

  }

  if(is.matrix(x))  # matrix

  {

    if(!is.null(rownames(x)) && !any(is.na(suppressWarnings(as.numeric(rownames(x))))))

    {

      x <- data.frame(as.numeric(rownames(x)), x)

      names(x)[1] <- ""

    }

    else

    {

      x <- data.frame(Index=seq_len(nrow(x)), x)

    }

  }

  if(is.list(x))  # data.frame or list

  {

    if(is.null(xlab))

      xlab <- names(x)[1]

    if(is.null(ylab))

    {

      if(length(x) == 2)

        ylab <- names(x)[2]

      else

        ylab <- ""

    }

    y <- x[-1]

    x <- x[[1]]

  }

  if(is.null(y))  # one numeric vector passed, plot it on 1:n

  {

    if(is.null(xlab))

      xlab <- "Index"

    if(is.null(ylab))

      ylab <- deparse(substitute(x))

    y <- x

    x <- seq_along(x)

  }

  if(is.null(xlab))

    xlab <- deparse(substitute(x))

  if(is.null(ylab))

    ylab <- deparse(substitute(y))

  y <- as.matrix(y)

  if(is.null(col))

    col <- gray.colors(ncol(y))

  col <- rep(col, length.out=ncol(y))

  if(prop)

    y <- prop.table(y, 1)

  y <- t(rbind(0, apply(y, 1, cumsum)))

  na <- is.na(x) | apply(is.na(y),1,any)

  x <- x[!na][order(x[!na])]

  y <- y[!na,][order(x[!na]),]

  if(!add)

    suppressWarnings(matplot(x, y, type="n", xlab=xlab, ylab=ylab, ...))

  xx <- c(x, rev(x))

  for(i in 1:(ncol(y)-1))

  {

    yy <- c(y[,i+1], rev(y[,i]))

    suppressWarnings(polygon(xx, yy, col=col[i], ...))

  }

  invisible(y[,-1])

}

areaplot.formula <-

function (formula, data, subset, na.action=NULL, ...)

{

  m <- match.call(expand.dots=FALSE)

  if(is.matrix(eval(m$data,parent.frame())))

    m$data <- as.data.frame(data)

  m$... <- NULL

  m[[1]] <- as.name("model.frame")

  if(as.character(formula[[2]]=="."))

  {

    rhs <- unlist(strsplit(deparse(formula[[3]])," *[:+] *"))

    lhs <- sprintf("cbind(%s)", paste(setdiff(names(data),rhs),collapse=","))

    m[[2]][[2]] <- parse(text=lhs)[[1]]

  }

  mf <- eval(m, parent.frame())

  if(is.matrix(mf[[1]]))

  {

    lhs <- as.data.frame(mf[[1]])

    names(lhs) <- as.character(m[[2]][[2]])[-1]

    areaplot.default(cbind(mf[-1],lhs), ...)

  }

  else

  {

    areaplot.default(mf[2:1], ...)

  }

}

-------------- next part --------------
\name{areaplot}

\alias{areaplot}

\alias{areaplot.default}

\alias{areaplot.formula}

\title{Area Plots}

\description{

  Produce a stacked area plot, or add polygons to an existing plot.

}

\usage{

areaplot(x, \dots)

\method{areaplot}{default}(x, y = NULL, prop = FALSE, add = FALSE, xlab = NULL,

         ylab = NULL, col = NULL, \dots)

\method{areaplot}{formula}(formula, data, subset, na.action = NULL, \dots)

}

\arguments{

  \item{x}{numeric vector of x values, or if \code{y=NULL} a numeric

    vector of y values. Can also be a 1-dimensional table (x values in

    names, y values in array), matrix or 2-dimensional table (x values

    in row names and y values in columns), a data frame (x values in

    first column and y values in subsequent columns), or a time-series

    object of class \code{ts/mts}.}

  \item{y}{numeric vector of y values, or a matrix containing y values

    in columns.}

  \item{prop}{whether data should be plotted as proportions, so stacked

    areas equal 1.}

  \item{add}{whether polygons should be added to an existing plot.}

  \item{xlab}{label for x axis.}

  \item{ylab}{label for y axis.}

  \item{col}{fill color of polygon(s). The default is a vector of gray

    colors.}

  \item{formula}{a \code{\link{formula}}, such as \code{y ~ x} or

    \code{cbind(y1, y2) ~ x}, specifying x and y values. A dot on the

    left-hand side, \code{formula = . ~ x}, means all variables except

    the one specified on the right-hand side.}

  \item{data}{a data frame (or list) from which the variables in

    \code{formula} should be taken.}

  \item{subset}{an optional vector specifying a subset of observations

    to be used.}

  \item{na.action}{a function which indicates what should happen when

    the data contain \code{NA} values. The default is to ignore missing

    values in the given variables.}

  \item{\dots}{further arguments passed to \code{matplot} and

    \code{polygon}.}

}

\value{

  Matrix of cumulative sums that was used for plotting.

}

\author{

  Arni Magnusson.

}

\seealso{

  \code{\link{barplot}}, \code{\link{polygon}}.

}

\examples{

areaplot(rpois(10,40))

areaplot(rnorm(10))

# formula

areaplot(Armed.Forces~Year, data=longley)

areaplot(cbind(Armed.Forces,Unemployed)~Year, data=longley)

# add=TRUE

plot(1940:1970, 500*runif(31), ylim=c(0,500))

areaplot(Armed.Forces~Year, data=longley, add=TRUE)

# matrix

areaplot(WorldPhones)

areaplot(WorldPhones, prop=TRUE)

# table

require(MASS)

areaplot(table(Aids2$age))

areaplot(table(Aids2$age, Aids2$sex))

# ts/mts

areaplot(austres)

areaplot(Seatbelts[,c("drivers","front","rear")],

         ylab="Killed or seriously injured")

abline(v=1983+1/12, lty=3)

}

\keyword{kwd1}

\keyword{kwd2}