[R] Significant performance difference between split of a data.frame and split of vectors
David Winsemius
dwinsemius at comcast.net
Wed Dec 9 05:37:37 CET 2009
On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:
> I have the following code, which tests the split on a data.frame and
> the split on each column (as vector) separately. The runtimes are of
> 10 time difference. When m and k increase, the difference become even
> bigger.
>
> I'm wondering why the performance on data.frame is so bad. Is it a bug
> in R? Can it be improved?
You might want to look at the data.table package. The author calinms
significant speed improvements over dta.frames
--
David.
>
>> system.time(split(as.data.frame(x),f))
> user system elapsed
> 1.700 0.010 1.786
>>
>> system.time(lapply(
> + 1:dim(x)[[2]]
> + , function(i) {
> + split(x[,i],f)
> + }
> + )
> + )
> user system elapsed
> 0.170 0.000 0.167
>
> ###########
> m=30000
> n=6
> k=3000
>
> set.seed(0)
> x=replicate(n,rnorm(m))
> f=sample(1:k, size=m, replace=T)
>
> system.time(split(as.data.frame(x),f))
>
> system.time(lapply(
> 1:dim(x)[[2]]
> , function(i) {
> split(x[,i],f)
> }
> )
> )
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list