[R] aggregate.formula implicitly removes rows containing NA
David Winsemius
dwinsemius at comcast.net
Wed Jan 12 00:56:13 CET 2011
On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:
> The documentation for `aggregate` makes it sound like
> aggregate.formula should behave identically to aggregate.data.frame
> (apart from the way the parameters are passed). But it looks like
> aggregate.formula is quietly removing rows where any of the "output"
> variables (those on the LHS of the formula) are NA. This differs
> from how aggregate.data.frame works. Is this expected behavior?
>
> Here are a couple of examples:
>
>> d <- data.frame(a=rep(1:2, each=2),
> + b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
> a b
> 1 1 1.5
> 2 2 NA
>> aggregate(b ~ a, d, mean)
> a b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d <- data.frame(a=rep(1:2, each=2),
> + b=c(1,2,NA,3),
> + c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
> a b c
> 1 1 2 2
>
The help page for aggregate gives the calling defaults for
aggregate.formula as:
## S3 method for class 'formula' aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
So the description you give seems to be adhering to what I would have
expected (had I initially read the help page.)
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list