[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
Heinz Tuechler
tuechler at gmx.at
Wed Oct 30 21:42:53 CET 2013
Best thanks for confirming my impression. I use dump for storing large
data.frames with a number of attributes for each variable. save/load is
much faster, but I am unsure, if such files will be readable by R
versions years later.
What format/functions would you suggest for data storage/transfer
between different (future) R versions?
best regards,
Heinz
on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> when it is parsing long vectors of numeric data. dump/source has never been an efficient
> way of transferring data between different R session, but it is much worse
> now for long vectors. In 2.15.2 doubling the size of the vector (of lengths
> in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> In 3.0.2 that factor is more like 4.4.
>
> n elapsed-2.15.2 elapsed-3.0.2
> 2048 0.003 0.018
> 4096 0.006 0.065
> 8192 0.013 0.254
> 16384 0.025 1.067
> 32768 0.050 4.114
> 65536 0.100 16.236
> 131072 0.219 66.013
> 262144 0.808 291.883
> 524288 2.022 1285.265
> 1048576 4.918 NA
> 2097152 9.857 NA
> 4194304 22.916 NA
> 8388608 49.671 NA
> 16777216 101.042 NA
> 33554432 512.719 NA
>
> I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not
> finish while I was at a 1 1/2 hour dentist's apppointment. The timing function
> was:
> test <- function(n = 2^(11:25))
> {
> tf <- tempfile()
> on.exit(unlink(tf))
> t(sapply(n, function(n){
> dput(log(seq_len(n)), file=tf)
> print(c(n=n, system.time(parse(file=tf))[1:3]))
> }))
> }
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of Carl Witthoft
>> Sent: Wednesday, October 30, 2013 5:29 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>
>> Did you run the identical code on the identical machine, and did you verify
>> there were no other tasks running which might have limited the RAM available
>> to R? And equally important, did you run these tests in the reverse order
>> (in case R was storing large objects from the first run, thus chewing up
>> RAM)?
>>
>>
>>
>> Dear All,
>>
>> is it known that source works much faster in R 2.15.2 than in R 3.0.2 ?
>> In the example below I observe e.g. for a data.frame with 10^7 rows the
>> following timings:
>>
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> length: 1e+07
>> user system elapsed
>> 62.04 0.22 62.26
>>
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> length: 1e+07
>> user system elapsed
>> 388.63 176.42 566.41
>>
>> Is there a way to speed R version 3.0.2 up to the performance of R
>> version 2.15.2?
>>
>> best regards,
>>
>> Heinz Tüchler
>>
>>
>> example:
>> sessionInfo()
>> sample.vec <-
>> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>> 'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
>> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> dump('df0', file='testdump')
>> cat('length:', i, '\n')
>> print(system.time(source('testdump', keep.source = FALSE,
>> encoding='')))
>> }
>>
>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>> sessionInfo()
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>> sample.vec <-
>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> + dump('df0', file='testdump')
>> + cat('length:', i, '\n')
>> + print(system.time(source('testdump', keep.source = FALSE,
>> + encoding='')))
>> + }
>> length: 10
>> user system elapsed
>> 0 0 0
>> length: 100
>> user system elapsed
>> 0 0 0
>> length: 1000
>> user system elapsed
>> 0 0 0
>> length: 10000
>> user system elapsed
>> 0.02 0.00 0.01
>> length: 1e+05
>> user system elapsed
>> 0.21 0.00 0.20
>> length: 1e+06
>> user system elapsed
>> 4.47 0.04 4.51
>> length: 1e+07
>> user system elapsed
>> 62.04 0.22 62.26
>>>
>>
>>
>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>> sessionInfo()
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>> sample.vec <-
>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> + dump('df0', file='testdump')
>> + cat('length:', i, '\n')
>> + print(system.time(source('testdump', keep.source = FALSE,
>> + encoding='')))
>> + }
>> length: 10
>> user system elapsed
>> 0 0 0
>> length: 100
>> user system elapsed
>> 0 0 0
>> length: 1000
>> user system elapsed
>> 0 0 0
>> length: 10000
>> user system elapsed
>> 0.01 0.00 0.01
>> length: 1e+05
>> user system elapsed
>> 0.36 0.06 0.42
>> length: 1e+06
>> user system elapsed
>> 6.02 1.86 7.88
>> length: 1e+07
>> user system elapsed
>> 388.63 176.42 566.41
>>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list