# [R] aggregating data and missing values

Pascal A. Niklaus Pascal.Niklaus at unibas.ch
Wed Nov 2 13:59:55 CET 2005

```Hi all,

I would like to aggregate a large data file that is defined by a number of
factors and associated values. The point is that not all factor level
combinations are present in the data file  -- these "missing" values are in
fact to be treated as zeroes.

Is there a straightforward way to
a) either expand the existing data set so that the missing factor combinations
can be added, or
b) an "aggregate" function that generates a row of data for all given factor
combinations.

Here is an example:

a) "complete" data set:

> example <-
data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> aggregate(cbind(d=example\$d),by=list(f1=example\$f1,f2=example\$f2),sum)
f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14
6  C  b 22

b) data set with "missing combinations":

> example2 <- example[c(-10,-12),]
> aggregate(cbind(d=example2\$d),by=list(f1=example2\$f1,f2=example2\$f2),sum)
f1 f2  d
1  A  a  4
2  B  a 12
3  C  a 20
4  A  b  6
5  B  b 14

Here, I would like to have the missing row width f1=C, f2=b, d=NA.

The solution I have come up with is very slow and cumbersome (because there a
re many factors) and I am convinced that there is a better way to do this (I
create a new data frame with all factor combinations present and then copy
the results from the call to aggregate line by line into the new data frame).

Thanks for your help

Pascal

```