[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
William Dunlap
wdunlap at tibco.com
Wed Oct 30 20:11:33 CET 2013
I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data. dump/source has never been an efficient
way of transferring data between different R session, but it is much worse
now for long vectors. In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.
n elapsed-2.15.2 elapsed-3.0.2
2048 0.003 0.018
4096 0.006 0.065
8192 0.013 0.254
16384 0.025 1.067
32768 0.050 4.114
65536 0.100 16.236
131072 0.219 66.013
262144 0.808 291.883
524288 2.022 1285.265
1048576 4.918 NA
2097152 9.857 NA
4194304 22.916 NA
8388608 49.671 NA
16777216 101.042 NA
33554432 512.719 NA
I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not
finish while I was at a 1 1/2 hour dentist's apppointment. The timing function
was:
test <- function(n = 2^(11:25))
{
tf <- tempfile()
on.exit(unlink(tf))
t(sapply(n, function(n){
dput(log(seq_len(n)), file=tf)
print(c(n=n, system.time(parse(file=tf))[1:3]))
}))
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Carl Witthoft
> Sent: Wednesday, October 30, 2013 5:29 AM
> To: r-help at r-project.org
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>
> Did you run the identical code on the identical machine, and did you verify
> there were no other tasks running which might have limited the RAM available
> to R? And equally important, did you run these tests in the reverse order
> (in case R was storing large objects from the first run, thus chewing up
> RAM)?
>
>
>
> Dear All,
>
> is it known that source works much faster in R 2.15.2 than in R 3.0.2 ?
> In the example below I observe e.g. for a data.frame with 10^7 rows the
> following timings:
>
> R version 2.15.2 Patched (2012-11-29 r61184)
> length: 1e+07
> user system elapsed
> 62.04 0.22 62.26
>
> R version 3.0.2 Patched (2013-10-27 r64116)
> length: 1e+07
> user system elapsed
> 388.63 176.42 566.41
>
> Is there a way to speed R version 3.0.2 up to the performance of R
> version 2.15.2?
>
> best regards,
>
> Heinz Tüchler
>
>
> example:
> sessionInfo()
> sample.vec <-
> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
> 'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> dump('df0', file='testdump')
> cat('length:', i, '\n')
> print(system.time(source('testdump', keep.source = FALSE,
> encoding='')))
> }
>
> output for R version 2.15.2 Patched (2012-11-29 r61184):
> > sessionInfo()
> R version 2.15.2 Patched (2012-11-29 r61184)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> > sample.vec <-
> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> + 'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> + dump('df0', file='testdump')
> + cat('length:', i, '\n')
> + print(system.time(source('testdump', keep.source = FALSE,
> + encoding='')))
> + }
> length: 10
> user system elapsed
> 0 0 0
> length: 100
> user system elapsed
> 0 0 0
> length: 1000
> user system elapsed
> 0 0 0
> length: 10000
> user system elapsed
> 0.02 0.00 0.01
> length: 1e+05
> user system elapsed
> 0.21 0.00 0.20
> length: 1e+06
> user system elapsed
> 4.47 0.04 4.51
> length: 1e+07
> user system elapsed
> 62.04 0.22 62.26
> >
>
>
> output for R version 3.0.2 Patched (2013-10-27 r64116):
> > sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> > sample.vec <-
> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> + 'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> + dump('df0', file='testdump')
> + cat('length:', i, '\n')
> + print(system.time(source('testdump', keep.source = FALSE,
> + encoding='')))
> + }
> length: 10
> user system elapsed
> 0 0 0
> length: 100
> user system elapsed
> 0 0 0
> length: 1000
> user system elapsed
> 0 0 0
> length: 10000
> user system elapsed
> 0.01 0.00 0.01
> length: 1e+05
> user system elapsed
> 0.36 0.06 0.42
> length: 1e+06
> user system elapsed
> 6.02 1.86 7.88
> length: 1e+07
> user system elapsed
> 388.63 176.42 566.41
> >
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list