[R] aggregate function - na.action
Denis Kazakiewicz
d.kazakiewicz at gmail.com
Sun Feb 6 22:15:03 CET 2011
Try to use formula notation and use na.action=na.pass
It is all described in the help(aggregate)
У Няд, 06/02/2011 у 14:54 -0600, Gene Leynes піша:
> On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn <izahn at psych.rochester.edu> wrote:
>
> > >
> > > However, I don't think you've told us what you're actually trying to
> > > accomplish...
> > >
> >
>
> I'm trying to aggregate the y value of a big data set which has several x's
> and a y.
> I'm using an abstracted example for many reasons. Partially, I'm using an
> abstracted example to comply with the posting guidelines of having a
> reproducible example. I'm really aggregating some incredibly boring and
> complex customer data for an undisclosed client.
>
> As it turns out,
> Aggregate will not work when some of x's are NA, unless you convert them to
> factors, with NA's included.
>
> In my case, the data is so big that doing the conversions causes other
> memory problems, and renders some of my numeric values useless for other
> calculations.
>
> My real data looks more like this (except with a few more categories and
> records):
>
> set.seed(100)
> library(plyr)
> dat=data.frame(
> x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
> x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
> x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
> x4=sample(c(NA,T,F), 2e6, replace=TRUE),
> x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
> replace=TRUE),
> x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
> x7=sample(c(NA,'married','divorced','separated','single','etc'),
> 2e6, replace=TRUE),
> x8=sample(c(NA,T,F), 2e6, replace=TRUE),
> y=trunc(rnorm(2e6)*10000), stringsAsFactors=F)
> str(dat)
> ## The control total
> sum(dat$y, na.rm=T)
> ## The aggregate total
> sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x)
> ## The ddply total
> sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x)
> {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum)
>
> ddply worked a little better than I expected at first, but it slows to a
> crawl or has runs out of memory too often for me to invest the time learning
> how to use it. Just now it worked for 1m records, and it was just a bit
> slower than aggregate. But for the 2m example it hasn't finished
> calculating.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list