[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
William Dunlap
wdunlap at tibco.com
Wed Oct 30 22:15:20 CET 2013
I have to defer to others for policy declarations like how long
the current format used by load and save should be readable.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: Heinz Tuechler [mailto:tuechler at gmx.at]
> Sent: Wednesday, October 30, 2013 1:43 PM
> To: William Dunlap
> Cc: Carl Witthoft; r-help at r-project.org
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>
> Best thanks for confirming my impression. I use dump for storing large
> data.frames with a number of attributes for each variable. save/load is
> much faster, but I am unsure, if such files will be readable by R
> versions years later.
> What format/functions would you suggest for data storage/transfer
> between different (future) R versions?
>
> best regards,
> Heinz
>
> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
> > I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> > when it is parsing long vectors of numeric data. dump/source has never been an
> efficient
> > way of transferring data between different R session, but it is much worse
> > now for long vectors. In 2.15.2 doubling the size of the vector (of lengths
> > in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> > In 3.0.2 that factor is more like 4.4.
> >
> > n elapsed-2.15.2 elapsed-3.0.2
> > 2048 0.003 0.018
> > 4096 0.006 0.065
> > 8192 0.013 0.254
> > 16384 0.025 1.067
> > 32768 0.050 4.114
> > 65536 0.100 16.236
> > 131072 0.219 66.013
> > 262144 0.808 291.883
> > 524288 2.022 1285.265
> > 1048576 4.918 NA
> > 2097152 9.857 NA
> > 4194304 22.916 NA
> > 8388608 49.671 NA
> > 16777216 101.042 NA
> > 33554432 512.719 NA
> >
> > I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not
> > finish while I was at a 1 1/2 hour dentist's apppointment. The timing function
> > was:
> > test <- function(n = 2^(11:25))
> > {
> > tf <- tempfile()
> > on.exit(unlink(tf))
> > t(sapply(n, function(n){
> > dput(log(seq_len(n)), file=tf)
> > print(c(n=n, system.time(parse(file=tf))[1:3]))
> > }))
> > }
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of Carl Witthoft
> >> Sent: Wednesday, October 30, 2013 5:29 AM
> >> To: r-help at r-project.org
> >> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
> >>
> >> Did you run the identical code on the identical machine, and did you verify
> >> there were no other tasks running which might have limited the RAM available
> >> to R? And equally important, did you run these tests in the reverse order
> >> (in case R was storing large objects from the first run, thus chewing up
> >> RAM)?
> >>
> >>
> >>
> >> Dear All,
> >>
> >> is it known that source works much faster in R 2.15.2 than in R 3.0.2 ?
> >> In the example below I observe e.g. for a data.frame with 10^7 rows the
> >> following timings:
> >>
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> length: 1e+07
> >> user system elapsed
> >> 62.04 0.22 62.26
> >>
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> length: 1e+07
> >> user system elapsed
> >> 388.63 176.42 566.41
> >>
> >> Is there a way to speed R version 3.0.2 up to the performance of R
> >> version 2.15.2?
> >>
> >> best regards,
> >>
> >> Heinz Tüchler
> >>
> >>
> >> example:
> >> sessionInfo()
> >> sample.vec <-
> >> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
> >> 'named', 'file', 'or', 'URL', 'or', 'connection')
> >> dmp.size <- c(10^(1:7))
> >> set.seed(37)
> >>
> >> for(i in dmp.size) {
> >> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> dump('df0', file='testdump')
> >> cat('length:', i, '\n')
> >> print(system.time(source('testdump', keep.source = FALSE,
> >> encoding='')))
> >> }
> >>
> >> output for R version 2.15.2 Patched (2012-11-29 r61184):
> >>> sessionInfo()
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods base
> >>> sample.vec <-
> >> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> + 'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> + dump('df0', file='testdump')
> >> + cat('length:', i, '\n')
> >> + print(system.time(source('testdump', keep.source = FALSE,
> >> + encoding='')))
> >> + }
> >> length: 10
> >> user system elapsed
> >> 0 0 0
> >> length: 100
> >> user system elapsed
> >> 0 0 0
> >> length: 1000
> >> user system elapsed
> >> 0 0 0
> >> length: 10000
> >> user system elapsed
> >> 0.02 0.00 0.01
> >> length: 1e+05
> >> user system elapsed
> >> 0.21 0.00 0.20
> >> length: 1e+06
> >> user system elapsed
> >> 4.47 0.04 4.51
> >> length: 1e+07
> >> user system elapsed
> >> 62.04 0.22 62.26
> >>>
> >>
> >>
> >> output for R version 3.0.2 Patched (2013-10-27 r64116):
> >>> sessionInfo()
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods base
> >>> sample.vec <-
> >> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> + 'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> + dump('df0', file='testdump')
> >> + cat('length:', i, '\n')
> >> + print(system.time(source('testdump', keep.source = FALSE,
> >> + encoding='')))
> >> + }
> >> length: 10
> >> user system elapsed
> >> 0 0 0
> >> length: 100
> >> user system elapsed
> >> 0 0 0
> >> length: 1000
> >> user system elapsed
> >> 0 0 0
> >> length: 10000
> >> user system elapsed
> >> 0.01 0.00 0.01
> >> length: 1e+05
> >> user system elapsed
> >> 0.36 0.06 0.42
> >> length: 1e+06
> >> user system elapsed
> >> 6.02 1.86 7.88
> >> length: 1e+07
> >> user system elapsed
> >> 388.63 176.42 566.41
> >>>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-
> in-
> >> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
More information about the R-help
mailing list