[R] Aggregate Values for All Levels of a Factor

Marc Schwartz MSchwartz at mn.rr.com
Fri Oct 6 01:13:27 CEST 2006


On Thu, 2006-10-05 at 15:44 -0700, Kaom Te wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> I'm a novice user trying to figure out how to retain NA aggregate
> values. For example, given a data frame with data for 3 of the 4
> possible factor colors("orange" is omitted from the data frame), I want
> to calculate the average height by color, but I'd like to retain the
> knowledge that "orange" is a possible factor, its just missing. Here is
> the example code:
> 
> > data <- data.frame(color = factor(c("blue","red","red","green","blue"),
>         levels = c("blue","red","green","orange")),
> 	height = c(2,8,4,4,5))
> > aggregate(data$height, list(color = data$color), mean)
>   color   x
> 1  blue 3.5
> 2   red 6.0
> 3 green 4.0
> >
> 
> Instead I would like to get
> 
>    color   x
> 1   blue 3.5
> 2    red 6.0
> 3  green 4.0
> 4 orange  NA
> 
> Is this possible. I've read as much documentation as I can find, but am
> unable to find the solution. It seems like something people would need
> to do. So I would assume it must be built in somewhere or do I need to
> write my own version of aggregate?
> 
> Thanks in advance,
> Kaom

If you review the Details section of ?aggregate, you will note:

  "Empty subsets are removed, ..."

Thus, one approach is:

tmp <- tapply(data$height, data$color, mean, na.rm = TRUE)

> tmp
  blue    red  green orange
   3.5    6.0    4.0     NA

DF <- data.frame(color = names(tmp), mean.height = tmp, 
                 row.names = seq(along = tmp))

> DF
   color mean.height
1   blue         3.5
2    red         6.0
3  green         4.0
4 orange          NA


HTH,

Marc Schwartz



More information about the R-help mailing list