[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler tuechler at gmx.at
Wed Oct 30 13:49:02 CET 2013


All was run on the identical machine in independent sessions. I did not 
restart Windows. I also tried 32bit R 3.0.2 and it seemed slightly 
faster than 64bit.
Using Process Explorer v15.23 
(http://technet.microsoft.com/de-de/sysinternals/bb896653) my impression 
was that R 3.0.2 manages memory in a different way than R 2.15.2. While 
in R 2.15.2 the physical memory used grows steadily, when sourcing a big 
file, in R 3.0.2 growth and shrinking cycle.

best,
Heinz

on/am 30.10.2013 13:28, Carl Witthoft wrote/hat geschrieben:
> Did you run the identical code on the identical machine, and did you verify
> there were no other tasks running which might have limited the RAM available
> to R?  And equally important, did you run these tests in the reverse order
> (in case R was storing large objects from the first run, thus chewing up
> RAM)?
>
>
>
> Dear All,
>
> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> In the example below I observe e.g. for a data.frame with 10^7 rows the
> following timings:
>
> R version 2.15.2 Patched (2012-11-29 r61184)
> length: 1e+07
>      user  system elapsed
>     62.04    0.22   62.26
>
> R version 3.0.2 Patched (2013-10-27 r64116)
> length: 1e+07
>      user  system elapsed
>    388.63  176.42  566.41
>
> Is there a way to speed R version 3.0.2 up to the performance of R
> version 2.15.2?
>
> best regards,
>
> Heinz Tüchler
>
>
> example:
> sessionInfo()
> sample.vec <-
>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>       'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>     dump('df0', file='testdump')
>     cat('length:', i, '\n')
>     print(system.time(source('testdump', keep.source = FALSE,
>                              encoding='')))
> }
>
> output for R version 2.15.2 Patched (2012-11-29 r61184):
>> sessionInfo()
> R version 2.15.2 Patched (2012-11-29 r61184)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>> sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>      user  system elapsed
>         0       0       0
> length: 100
>      user  system elapsed
>         0       0       0
> length: 1000
>      user  system elapsed
>         0       0       0
> length: 10000
>      user  system elapsed
>      0.02    0.00    0.01
> length: 1e+05
>      user  system elapsed
>      0.21    0.00    0.20
> length: 1e+06
>      user  system elapsed
>      4.47    0.04    4.51
> length: 1e+07
>      user  system elapsed
>     62.04    0.22   62.26
>>
>
>
> output for R version 3.0.2 Patched (2013-10-27 r64116):
>> sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>> sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>      user  system elapsed
>         0       0       0
> length: 100
>      user  system elapsed
>         0       0       0
> length: 1000
>      user  system elapsed
>         0       0       0
> length: 10000
>      user  system elapsed
>      0.01    0.00    0.01
> length: 1e+05
>      user  system elapsed
>      0.36    0.06    0.42
> length: 1e+06
>      user  system elapsed
>      6.02    1.86    7.88
> length: 1e+07
>      user  system elapsed
>    388.63  176.42  566.41
>>
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list