[R] A faster way to aggregate?
Dieter Menne
dieter.menne at menne-biomed.de
Mon Jul 4 11:45:27 CEST 2005
Dear List,
I have a logical data frame with NA's and a grouping factor, and I want to
calculate
the % TRUE per column and group. With an indexed database, result are mainly
limited by printout time, but my R-solution below let's me wait (there are
about 10* cases in the real
data set).
Any suggestions to speed this up? Yes, I could wait for the result in real
life, but just curious if I did something wrong. In real life, data set is
ordered by groups, but how can I use this with a data frame?
Dieter Menne
# Generate test data
ncol = 20
nrow = 20000
ngroup=nrow %/% 20
colrow=ncol*nrow
group = factor(floor(runif(nrow)*ngroup))
sc = data.frame(group,matrix(ifelse(runif(colrow) >
0.1,runif(colrow)>0.3,NA),
nrow=nrow))
# aggregate
system.time ({
s = aggregate(sc[2:(ncol+1)],list(group = group),
function(x) {
xt=table(x)
as.integer(100*xt[2]/(xt[1]+xt[2]))
}
)
})
# 26.09 0.03 26.95 NA NA
# by and apply
system.time ({
s = by (sc[2:(ncol+1)],group,function(x) {
apply(x,2,function(x) {
xt=table(x)
as.integer(100*xt[2]/(xt[1]+xt[2]))
}
)
})
s=do.call("rbind",s)
})
# 82.89 0.18 85.16 NA NA
More information about the R-help
mailing list