[R] aggregate function - na.action

Ista Zahn izahn at psych.rochester.edu
Sat Feb 5 01:05:14 CET 2011


Hi,

On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes <gleynes+r at gmail.com> wrote:
> Thank you both for the thoughtful (and funny) replies.
>
> I agree with both of you that sum is the one picking up aggregate.  Although
> I didn't mention it, I did realize that in the first place.
> Also, thank you Phil for pointing out that aggregate only accepts a formula
> value in more recent versions!  I actually thought that was an older
> feature, but I must be thinking of other functions.
>
> I still don't see why these two values are not the same!
>
> It seems like a bug to me

No, not a bug (see below).

>
>> set.seed(100)
>> dat=data.frame(
> +         x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> +         x2=sample(c(NA, 1:10), 100, replace=TRUE),
> +         x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> +         x4=sample(c(NA,T,F), 100, replace=TRUE),
> +         y=sample(c(rep(NA,5), rnorm(95))))
>> sum(dat$y, na.rm=T)
> [1] 0.0815244116598
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y)
> [1] -4.45087666247
>>

Because in the first one you are only removing missing data in dat$y.
In the second one you are removeing all rows that contain missing data
in any of the columns.

all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
sum, na.action=na.pass, na.rm=T)$y))
[1] TRUE

Best,
Ista

>
>
>
> On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <izahn at psych.rochester.edu> wrote:
>>
>> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>>
>> -Ista
>>
>> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <izahn at psych.rochester.edu>
>> wrote:
>> > Hi,
>> >
>> > Please see ?na.action
>> >
>> > (just kidding!)
>> >
>> > So it seems to me the problem is that you are passing na.rm to the sum
>> > function. So there is no missing data for the na.action argument to
>> > operate on!
>> >
>> > Compare
>> >
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>> >
>> >
>> > Best,
>> > Ista
>> >
>> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> wrote:
>> >> Can someone please tell me what is up with na.action in aggregate?
>> >>
>> >> My (somewhat) reproducible example:
>> >> (I say somewhat because some lines wouldn't run in a separate session,
>> >> more
>> >> below)
>> >>
>> >> set.seed(100)
>> >> dat=data.frame(
>> >>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> >>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> >>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> >>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>> >>        y=sample(c(rep(NA,5), rnorm(95))))
>> >> dat
>> >> ## The total from dat:
>> >> sum(dat$y, na.rm=T)
>> >> ## The total from aggregate:
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This
>> >> line
>> >> gave an error in a separate R instance
>> >> ## The aggregate formula is excluding NA
>> >>
>> >> ## So, let's try to include NAs
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action='na.pass')$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action=na.pass)$y)
>> >> ## The aggregate formula is STILL excluding NA
>> >> ## In fact, the formula doesn't seem to notice the na.action
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
>> >> chew')$y)
>> >> ## Hmmmm... that error surprised me (since the previous two things ran)
>> >>
>> >> ## So, let's try to change the global options
>> >> ## (not mentioned in the help, but after reading the help
>> >> ##  100 times, I thought I would go above and beyond to avoid
>> >> ##  any r list flames from people complaining
>> >> ##  that I didn't read the help... but that's a separate topic)
>> >> options(na.action ="na.pass")
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action='na.pass')$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action=na.pass)$y)
>> >> ## (NAs are still omitted)
>> >>
>> >> ## Even more frustrating...
>> >> ## Why don't any of these work???
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>> >>
>> >>
>> >> ## This does work, but in my real data set, I want NA to really be NA
>> >> for(j in 1:4)
>> >>    dat[is.na(dat[,j]),j] = 'NA'
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> >>
>> >>
>> >> ## My first session info
>> >> #
>> >> #> sessionInfo()
>> >> #R version 2.12.0 (2010-10-15)
>> >> #Platform: i386-pc-mingw32/i386 (32-bit)
>> >> #
>> >> #locale:
>> >> #        [1] LC_COLLATE=English_United States.1252
>> >> #[2] LC_CTYPE=English_United States.1252
>> >> #[3] LC_MONETARY=English_United States.1252
>> >> #[4] LC_NUMERIC=C
>> >> #[5] LC_TIME=English_United States.1252
>> >> #
>> >> #attached base packages:
>> >> #        [1] stats     graphics  grDevices utils     datasets  methods
>> >> base
>> >> #
>> >> #other attached packages:
>> >> #        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
>> >> #
>> >> #loaded via a namespace (and not attached):
>> >> #        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13
>> >> rJava_0.8-8
>> >> #[5] tools_2.12.0
>> >>
>> >>
>> >>
>> >> I tried running that example in a different version of R, with and I
>> >> got
>> >> completely different results
>> >>
>> >> The other version of R wouldn't recognize the formula at all..
>> >>
>> >> My other version of R:
>> >>
>> >> #  My second session info
>> >> #> sessionInfo()
>> >> #R version 2.10.1 (2009-12-14)
>> >> #i386-pc-mingw32
>> >> #
>> >> #locale:
>> >> #        [1] LC_COLLATE=English_United States.1252
>> >> #[2] LC_CTYPE=English_United States.1252
>> >> #[3] LC_MONETARY=English_United States.1252
>> >> #[4] LC_NUMERIC=C
>> >> #[5] LC_TIME=English_United States.1252
>> >> #
>> >> #attached base packages:
>> >> #        [1] stats     graphics  grDevices utils     datasets  methods
>> >> base
>> >> #>
>> >> #
>> >>
>> >> PS: Also, I have read the help on aggregate, factor, as.factor, and
>> >> several
>> >> other topics.  If I missed something, please let me know.
>> >> Some people like to reply to questions by telling the sender that R has
>> >> documentation.  Please don't.  The R help archives are littered with
>> >> reminders, friendly and otherwise, of R's documentation.
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >
>> >
>> > --
>> > Ista Zahn
>> > Graduate student
>> > University of Rochester
>> > Department of Clinical and Social Psychology
>> > http://yourpsyche.org
>> >
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list