[R] aggregating data and missing values
Pascal A. Niklaus
Pascal.Niklaus at unibas.ch
Wed Nov 2 13:59:55 CET 2005
Hi all,
I would like to aggregate a large data file that is defined by a number of
factors and associated values. The point is that not all factor level
combinations are present in the data file -- these "missing" values are in
fact to be treated as zeroes.
Is there a straightforward way to
a) either expand the existing data set so that the missing factor combinations
can be added, or
b) an "aggregate" function that generates a row of data for all given factor
combinations.
Here is an example:
a) "complete" data set:
> example <-
data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum)
f1 f2 d
1 A a 4
2 B a 12
3 C a 20
4 A b 6
5 B b 14
6 C b 22
b) data set with "missing combinations":
> example2 <- example[c(-10,-12),]
> aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum)
f1 f2 d
1 A a 4
2 B a 12
3 C a 20
4 A b 6
5 B b 14
Here, I would like to have the missing row width f1=C, f2=b, d=NA.
The solution I have come up with is very slow and cumbersome (because there a
re many factors) and I am convinced that there is a better way to do this (I
create a new data frame with all factor combinations present and then copy
the results from the call to aggregate line by line into the new data frame).
Thanks for your help
Pascal
More information about the R-help
mailing list