[R] aggregating data and missing values

Pascal A. Niklaus Pascal.Niklaus at unibas.ch
Wed Nov 2 13:59:55 CET 2005


Hi all,

I would like to aggregate a large data file that is defined by a number of 
factors and associated values. The point is that not all factor level 
combinations are present in the data file  -- these "missing" values are in 
fact to be treated as zeroes.

Is there a straightforward way to 
a) either expand the existing data set so that the missing factor combinations 
can be added, or 
b) an "aggregate" function that generates a row of data for all given factor 
combinations.

Here is an example:

a) "complete" data set:

> example <- 
data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum)
  f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14
6  C  b 22

b) data set with "missing combinations":

> example2 <- example[c(-10,-12),]
> aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum)
  f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14

Here, I would like to have the missing row width f1=C, f2=b, d=NA.

The solution I have come up with is very slow and cumbersome (because there a 
re many factors) and I am convinced that there is a better way to do this (I 
create a new data frame with all factor combinations present and then copy 
the results from the call to aggregate line by line into the new data frame).

Thanks for your help

Pascal




More information about the R-help mailing list