[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Carl Witthoft carl at witthoft.com
Wed Oct 30 13:28:47 CET 2013


Did you run the identical code on the identical machine, and did you verify
there were no other tasks running which might have limited the RAM available
to R?  And equally important, did you run these tests in the reverse order
(in case R was storing large objects from the first run, thus chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the 
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R 
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
   dump('df0', file='testdump')
   cat('length:', i, '\n')
   print(system.time(source('testdump', keep.source = FALSE,
                            encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
> sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.02    0.00    0.01
length: 1e+05
    user  system elapsed
    0.21    0.00    0.20
length: 1e+06
    user  system elapsed
    4.47    0.04    4.51
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26
>


output for R version 3.0.2 Patched (2013-10-27 r64116):
> sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.01    0.00    0.01
length: 1e+05
    user  system elapsed
    0.36    0.06    0.42
length: 1e+06
    user  system elapsed
    6.02    1.86    7.88
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41
>





--
View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list