[R] aggregating data and missing values

Wed Nov 2 14:34:01 CET 2005

On 11/2/05, Pascal A. Niklaus <Pascal.Niklaus at unibas.ch> wrote:
> Hi all,
>
> I would like to aggregate a large data file that is defined by a number of
> factors and associated values. The point is that not all factor level
> combinations are present in the data file  -- these "missing" values are in
> fact to be treated as zeroes.
>
> Is there a straightforward way to
> a) either expand the existing data set so that the missing factor combinations
> can be added, or
> b) an "aggregate" function that generates a row of data for all given factor
> combinations.
>
> Here is an example:
>
> a) "complete" data set:
>
> > example <-
> data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> > aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum)
>  f1 f2  d
> 1  A  a  4
> 2  B  a 12
> 3  C  a 20
> 4  A  b  6
> 5  B  b 14
> 6  C  b 22
>
> b) data set with "missing combinations":
>
> > example2 <- example[c(-10,-12),]
> > aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum)
>  f1 f2  d
> 1  A  a  4
> 2  B  a 12
> 3  C  a 20
> 4  A  b  6
> 5  B  b 14
>
> Here, I would like to have the missing row width f1=C, f2=b, d=NA.

Suppose the result of the aggregate just shown is example2.ag .  Then

merge(example2.ag, expand.grid(f1 = LETTERS[1:3], f2 = letters[1:2]),
all = TRUE)