# [R] aggregating data and missing values

Pascal A. Niklaus Pascal.Niklaus at unibas.ch
Wed Nov 2 13:59:55 CET 2005

```Hi all,

I would like to aggregate a large data file that is defined by a number of
factors and associated values. The point is that not all factor level
combinations are present in the data file  -- these "missing" values are in
fact to be treated as zeroes.

Is there a straightforward way to
a) either expand the existing data set so that the missing factor combinations
b) an "aggregate" function that generates a row of data for all given factor
combinations.

Here is an example:

a) "complete" data set:

> example <-
data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> aggregate(cbind(d=example\$d),by=list(f1=example\$f1,f2=example\$f2),sum)
f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14
6  C  b 22

b) data set with "missing combinations":

> example2 <- example[c(-10,-12),]
> aggregate(cbind(d=example2\$d),by=list(f1=example2\$f1,f2=example2\$f2),sum)
f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14

Here, I would like to have the missing row width f1=C, f2=b, d=NA.

The solution I have come up with is very slow and cumbersome (because there a
re many factors) and I am convinced that there is a better way to do this (I
create a new data frame with all factor combinations present and then copy
the results from the call to aggregate line by line into the new data frame).