[Rd] split() is slow on data.frame (PR#14123)
pengyu.ut at gmail.com
pengyu.ut at gmail.com
Wed Dec 9 23:10:09 CET 2009
Please see the following code for the runtime comparison between
split() and mysplit.data.frame() (they do the same thing
semantically). mysplit.data.frame() is a fix of split() in term of
performance. Could somebody include this fix (with possible checking
for corner cases) in future version of R and let me know the inclusion
of the fix?
m=300000
n=6
k=30000
set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)
mysplit.data.frame<-function(x,f) {
print('processing data.frame')
v=lapply(
1:dim(x)[[2]]
, function(i) {
split(x[,i],f)
}
)
w=lapply(
seq(along=v[[1]])
, function(i) {
result=do.call(
cbind
, lapply(v,
function(vj) {
vj[[i]]
}
)
)
colnames(result)=colnames(x)
return(result)
}
)
names(w)=names(v[[1]])
return(w)
}
system.time(split(as.data.frame(x),f))
system.time(mysplit.data.frame(as.data.frame(x),f))
More information about the R-devel
mailing list