[R] aggregating data and missing values
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Nov 2 14:34:01 CET 2005
On 11/2/05, Pascal A. Niklaus <Pascal.Niklaus at unibas.ch> wrote:
> Hi all,
>
> I would like to aggregate a large data file that is defined by a number of
> factors and associated values. The point is that not all factor level
> combinations are present in the data file -- these "missing" values are in
> fact to be treated as zeroes.
>
> Is there a straightforward way to
> a) either expand the existing data set so that the missing factor combinations
> can be added, or
> b) an "aggregate" function that generates a row of data for all given factor
> combinations.
>
> Here is an example:
>
> a) "complete" data set:
>
> > example <-
> data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> > aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum)
> f1 f2 d
> 1 A a 4
> 2 B a 12
> 3 C a 20
> 4 A b 6
> 5 B b 14
> 6 C b 22
>
> b) data set with "missing combinations":
>
> > example2 <- example[c(-10,-12),]
> > aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum)
> f1 f2 d
> 1 A a 4
> 2 B a 12
> 3 C a 20
> 4 A b 6
> 5 B b 14
>
> Here, I would like to have the missing row width f1=C, f2=b, d=NA.
Suppose the result of the aggregate just shown is example2.ag . Then
merge(example2.ag, expand.grid(f1 = LETTERS[1:3], f2 = letters[1:2]),
all = TRUE)
More information about the R-help
mailing list