[R] aggregate function - na.action
Ista Zahn
izahn at psych.rochester.edu
Fri Feb 4 23:16:34 CET 2011
Hi,
Please see ?na.action
(just kidding!)
So it seems to me the problem is that you are passing na.rm to the sum
function. So there is no missing data for the na.action argument to
operate on!
Compare
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
Best,
Ista
On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> wrote:
> Can someone please tell me what is up with na.action in aggregate?
>
> My (somewhat) reproducible example:
> (I say somewhat because some lines wouldn't run in a separate session, more
> below)
>
> set.seed(100)
> dat=data.frame(
> x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> x2=sample(c(NA, 1:10), 100, replace=TRUE),
> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> x4=sample(c(NA,T,F), 100, replace=TRUE),
> y=sample(c(rep(NA,5), rnorm(95))))
> dat
> ## The total from dat:
> sum(dat$y, na.rm=T)
> ## The total from aggregate:
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This line
> gave an error in a separate R instance
> ## The aggregate formula is excluding NA
>
> ## So, let's try to include NAs
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## The aggregate formula is STILL excluding NA
> ## In fact, the formula doesn't seem to notice the na.action
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
> chew')$y)
> ## Hmmmm... that error surprised me (since the previous two things ran)
>
> ## So, let's try to change the global options
> ## (not mentioned in the help, but after reading the help
> ## 100 times, I thought I would go above and beyond to avoid
> ## any r list flames from people complaining
> ## that I didn't read the help... but that's a separate topic)
> options(na.action ="na.pass")
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## (NAs are still omitted)
>
> ## Even more frustrating...
> ## Why don't any of these work???
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>
>
> ## This does work, but in my real data set, I want NA to really be NA
> for(j in 1:4)
> dat[is.na(dat[,j]),j] = 'NA'
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>
>
> ## My first session info
> #
> #> sessionInfo()
> #R version 2.12.0 (2010-10-15)
> #Platform: i386-pc-mingw32/i386 (32-bit)
> #
> #locale:
> # [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> # [1] stats graphics grDevices utils datasets methods
> base
> #
> #other attached packages:
> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5
> #
> #loaded via a namespace (and not attached):
> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8
> #[5] tools_2.12.0
>
>
>
> I tried running that example in a different version of R, with and I got
> completely different results
>
> The other version of R wouldn't recognize the formula at all..
>
> My other version of R:
>
> # My second session info
> #> sessionInfo()
> #R version 2.10.1 (2009-12-14)
> #i386-pc-mingw32
> #
> #locale:
> # [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> # [1] stats graphics grDevices utils datasets methods
> base
> #>
> #
>
> PS: Also, I have read the help on aggregate, factor, as.factor, and several
> other topics. If I missed something, please let me know.
> Some people like to reply to questions by telling the sender that R has
> documentation. Please don't. The R help archives are littered with
> reminders, friendly and otherwise, of R's documentation.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
More information about the R-help
mailing list