[R] aggregate.formula implicitly removes rows containing NA
Dickison, Daniel
ddickison at carnegielearning.com
Wed Jan 12 02:29:19 CET 2011
Oh wow, that would be it. Not sure how I missed that. Thanks for the tip.
Sent from my iPhone
On Jan 11, 2011, at 18:56, "David Winsemius" <dwinsemius at comcast.net> wrote:
>
> On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:
>
>> The documentation for `aggregate` makes it sound like
>> aggregate.formula should behave identically to aggregate.data.frame
>> (apart from the way the parameters are passed). But it looks like
>> aggregate.formula is quietly removing rows where any of the "output"
>> variables (those on the LHS of the formula) are NA. This differs
>> from how aggregate.data.frame works. Is this expected behavior?
>>
>> Here are a couple of examples:
>>
>>> d <- data.frame(a=rep(1:2, each=2),
>> + b=c(1,2,NA,3))
>>> aggregate(d["b"], d["a"], mean)
>> a b
>> 1 1 1.5
>> 2 2 NA
>>> aggregate(b ~ a, d, mean)
>> a b
>> 1 1 1.5
>> 2 2 3.0
>>
>> It's removing whole rows even if just one of the columns is NA, i.e.:
>>
>>> d <- data.frame(a=rep(1:2, each=2),
>> + b=c(1,2,NA,3),
>> + c=c(NA,2,3,NA))
>>> aggregate(cbind(b,c) ~ a, d, mean)
>> a b c
>> 1 1 2 2
>>
>
> The help page for aggregate gives the calling defaults for
> aggregate.formula as:
> ## S3 method for class 'formula' aggregate(formula, data, FUN, ...,
> subset, na.action = na.omit)
> So the description you give seems to be adhering to what I would have
> expected (had I initially read the help page.)
> --
> David Winsemius, MD
> West Hartford, CT
>
More information about the R-help
mailing list