[R] aggregate function - na.action

Ista Zahn izahn at psych.rochester.edu
Fri Feb 4 23:18:35 CET 2011


Sorry, I didn't see Phil's reply, which is better than mine anyway.

-Ista

On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <izahn at psych.rochester.edu> wrote:
> Hi,
>
> Please see ?na.action
>
> (just kidding!)
>
> So it seems to me the problem is that you are passing na.rm to the sum
> function. So there is no missing data for the na.action argument to
> operate on!
>
> Compare
>
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>
>
> Best,
> Ista
>
> On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> wrote:
>> Can someone please tell me what is up with na.action in aggregate?
>>
>> My (somewhat) reproducible example:
>> (I say somewhat because some lines wouldn't run in a separate session, more
>> below)
>>
>> set.seed(100)
>> dat=data.frame(
>>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>>        y=sample(c(rep(NA,5), rnorm(95))))
>> dat
>> ## The total from dat:
>> sum(dat$y, na.rm=T)
>> ## The total from aggregate:
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
>> gave an error in a separate R instance
>> ## The aggregate formula is excluding NA
>>
>> ## So, let's try to include NAs
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
>> ## The aggregate formula is STILL excluding NA
>> ## In fact, the formula doesn't seem to notice the na.action
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
>> chew')$y)
>> ## Hmmmm... that error surprised me (since the previous two things ran)
>>
>> ## So, let's try to change the global options
>> ## (not mentioned in the help, but after reading the help
>> ##  100 times, I thought I would go above and beyond to avoid
>> ##  any r list flames from people complaining
>> ##  that I didn't read the help... but that's a separate topic)
>> options(na.action ="na.pass")
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
>> ## (NAs are still omitted)
>>
>> ## Even more frustrating...
>> ## Why don't any of these work???
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>>
>>
>> ## This does work, but in my real data set, I want NA to really be NA
>> for(j in 1:4)
>>    dat[is.na(dat[,j]),j] = 'NA'
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>>
>>
>> ## My first session info
>> #
>> #> sessionInfo()
>> #R version 2.12.0 (2010-10-15)
>> #Platform: i386-pc-mingw32/i386 (32-bit)
>> #
>> #locale:
>> #        [1] LC_COLLATE=English_United States.1252
>> #[2] LC_CTYPE=English_United States.1252
>> #[3] LC_MONETARY=English_United States.1252
>> #[4] LC_NUMERIC=C
>> #[5] LC_TIME=English_United States.1252
>> #
>> #attached base packages:
>> #        [1] stats     graphics  grDevices utils     datasets  methods
>> base
>> #
>> #other attached packages:
>> #        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
>> #
>> #loaded via a namespace (and not attached):
>> #        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13 rJava_0.8-8
>> #[5] tools_2.12.0
>>
>>
>>
>> I tried running that example in a different version of R, with and I got
>> completely different results
>>
>> The other version of R wouldn't recognize the formula at all..
>>
>> My other version of R:
>>
>> #  My second session info
>> #> sessionInfo()
>> #R version 2.10.1 (2009-12-14)
>> #i386-pc-mingw32
>> #
>> #locale:
>> #        [1] LC_COLLATE=English_United States.1252
>> #[2] LC_CTYPE=English_United States.1252
>> #[3] LC_MONETARY=English_United States.1252
>> #[4] LC_NUMERIC=C
>> #[5] LC_TIME=English_United States.1252
>> #
>> #attached base packages:
>> #        [1] stats     graphics  grDevices utils     datasets  methods
>> base
>> #>
>> #
>>
>> PS: Also, I have read the help on aggregate, factor, as.factor, and several
>> other topics.  If I missed something, please let me know.
>> Some people like to reply to questions by telling the sender that R has
>> documentation.  Please don't.  The R help archives are littered with
>> reminders, friendly and otherwise, of R's documentation.
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list