[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler tuechler at gmx.at
Wed Oct 30 21:42:53 CET 2013


Best thanks for confirming my impression. I use dump for storing large 
data.frames with a number of attributes for each variable. save/load is 
much faster, but I am unsure, if such files will be readable by R 
versions years later.
What format/functions would you suggest for data storage/transfer 
between different (future) R versions?

best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> when it is parsing long vectors of numeric data.  dump/source has never been an efficient
> way of transferring data between different R session, but it is much worse
> now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
> in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> In 3.0.2 that factor is more like 4.4.
>
>         n elapsed-2.15.2 elapsed-3.0.2
>      2048          0.003         0.018
>      4096          0.006         0.065
>      8192          0.013         0.254
>     16384          0.025         1.067
>     32768          0.050         4.114
>     65536          0.100        16.236
>    131072          0.219        66.013
>    262144          0.808       291.883
>    524288          2.022      1285.265
>   1048576          4.918            NA
>   2097152          9.857            NA
>   4194304         22.916            NA
>   8388608         49.671            NA
> 16777216        101.042            NA
> 33554432        512.719            NA
>
> I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
> finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
> was:
>    test <- function(n = 2^(11:25))
>    {
>        tf <- tempfile()
>        on.exit(unlink(tf))
>        t(sapply(n, function(n){
>            dput(log(seq_len(n)), file=tf)
>            print(c(n=n, system.time(parse(file=tf))[1:3]))
>        }))
>    }
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of Carl Witthoft
>> Sent: Wednesday, October 30, 2013 5:29 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>
>> Did you run the identical code on the identical machine, and did you verify
>> there were no other tasks running which might have limited the RAM available
>> to R?  And equally important, did you run these tests in the reverse order
>> (in case R was storing large objects from the first run, thus chewing up
>> RAM)?
>>
>>
>>
>> Dear All,
>>
>> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
>> In the example below I observe e.g. for a data.frame with 10^7 rows the
>> following timings:
>>
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> length: 1e+07
>>      user  system elapsed
>>     62.04    0.22   62.26
>>
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> length: 1e+07
>>      user  system elapsed
>>    388.63  176.42  566.41
>>
>> Is there a way to speed R version 3.0.2 up to the performance of R
>> version 2.15.2?
>>
>> best regards,
>>
>> Heinz Tüchler
>>
>>
>> example:
>> sessionInfo()
>> sample.vec <-
>>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>>       'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
>>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>     dump('df0', file='testdump')
>>     cat('length:', i, '\n')
>>     print(system.time(source('testdump', keep.source = FALSE,
>>                              encoding='')))
>> }
>>
>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>> sessionInfo()
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> sample.vec <-
>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> +   dump('df0', file='testdump')
>> +   cat('length:', i, '\n')
>> +   print(system.time(source('testdump', keep.source = FALSE,
>> +                            encoding='')))
>> + }
>> length: 10
>>      user  system elapsed
>>         0       0       0
>> length: 100
>>      user  system elapsed
>>         0       0       0
>> length: 1000
>>      user  system elapsed
>>         0       0       0
>> length: 10000
>>      user  system elapsed
>>      0.02    0.00    0.01
>> length: 1e+05
>>      user  system elapsed
>>      0.21    0.00    0.20
>> length: 1e+06
>>      user  system elapsed
>>      4.47    0.04    4.51
>> length: 1e+07
>>      user  system elapsed
>>     62.04    0.22   62.26
>>>
>>
>>
>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>> sessionInfo()
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> sample.vec <-
>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> +   dump('df0', file='testdump')
>> +   cat('length:', i, '\n')
>> +   print(system.time(source('testdump', keep.source = FALSE,
>> +                            encoding='')))
>> + }
>> length: 10
>>      user  system elapsed
>>         0       0       0
>> length: 100
>>      user  system elapsed
>>         0       0       0
>> length: 1000
>>      user  system elapsed
>>         0       0       0
>> length: 10000
>>      user  system elapsed
>>      0.01    0.00    0.01
>> length: 1e+05
>>      user  system elapsed
>>      0.36    0.06    0.42
>> length: 1e+06
>>      user  system elapsed
>>      6.02    1.86    7.88
>> length: 1e+07
>>      user  system elapsed
>>    388.63  176.42  566.41
>>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list