[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

William Dunlap wdunlap at tibco.com
Wed Oct 30 22:15:20 CET 2013


I have to defer to others for policy declarations like how long
the current format used by load and save should be readable.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Heinz Tuechler [mailto:tuechler at gmx.at]
> Sent: Wednesday, October 30, 2013 1:43 PM
> To: William Dunlap
> Cc: Carl Witthoft; r-help at r-project.org
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
> 
> Best thanks for confirming my impression. I use dump for storing large
> data.frames with a number of attributes for each variable. save/load is
> much faster, but I am unsure, if such files will be readable by R
> versions years later.
> What format/functions would you suggest for data storage/transfer
> between different (future) R versions?
> 
> best regards,
> Heinz
> 
> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
> > I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> > when it is parsing long vectors of numeric data.  dump/source has never been an
> efficient
> > way of transferring data between different R session, but it is much worse
> > now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
> > in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> > In 3.0.2 that factor is more like 4.4.
> >
> >         n elapsed-2.15.2 elapsed-3.0.2
> >      2048          0.003         0.018
> >      4096          0.006         0.065
> >      8192          0.013         0.254
> >     16384          0.025         1.067
> >     32768          0.050         4.114
> >     65536          0.100        16.236
> >    131072          0.219        66.013
> >    262144          0.808       291.883
> >    524288          2.022      1285.265
> >   1048576          4.918            NA
> >   2097152          9.857            NA
> >   4194304         22.916            NA
> >   8388608         49.671            NA
> > 16777216        101.042            NA
> > 33554432        512.719            NA
> >
> > I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
> > finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
> > was:
> >    test <- function(n = 2^(11:25))
> >    {
> >        tf <- tempfile()
> >        on.exit(unlink(tf))
> >        t(sapply(n, function(n){
> >            dput(log(seq_len(n)), file=tf)
> >            print(c(n=n, system.time(parse(file=tf))[1:3]))
> >        }))
> >    }
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of Carl Witthoft
> >> Sent: Wednesday, October 30, 2013 5:29 AM
> >> To: r-help at r-project.org
> >> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
> >>
> >> Did you run the identical code on the identical machine, and did you verify
> >> there were no other tasks running which might have limited the RAM available
> >> to R?  And equally important, did you run these tests in the reverse order
> >> (in case R was storing large objects from the first run, thus chewing up
> >> RAM)?
> >>
> >>
> >>
> >> Dear All,
> >>
> >> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> >> In the example below I observe e.g. for a data.frame with 10^7 rows the
> >> following timings:
> >>
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> length: 1e+07
> >>      user  system elapsed
> >>     62.04    0.22   62.26
> >>
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> length: 1e+07
> >>      user  system elapsed
> >>    388.63  176.42  566.41
> >>
> >> Is there a way to speed R version 3.0.2 up to the performance of R
> >> version 2.15.2?
> >>
> >> best regards,
> >>
> >> Heinz Tüchler
> >>
> >>
> >> example:
> >> sessionInfo()
> >> sample.vec <-
> >>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
> >>       'named', 'file', 'or', 'URL', 'or', 'connection')
> >> dmp.size <- c(10^(1:7))
> >> set.seed(37)
> >>
> >> for(i in dmp.size) {
> >>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >>     dump('df0', file='testdump')
> >>     cat('length:', i, '\n')
> >>     print(system.time(source('testdump', keep.source = FALSE,
> >>                              encoding='')))
> >> }
> >>
> >> output for R version 2.15.2 Patched (2012-11-29 r61184):
> >>> sessionInfo()
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>> sample.vec <-
> >> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> +   dump('df0', file='testdump')
> >> +   cat('length:', i, '\n')
> >> +   print(system.time(source('testdump', keep.source = FALSE,
> >> +                            encoding='')))
> >> + }
> >> length: 10
> >>      user  system elapsed
> >>         0       0       0
> >> length: 100
> >>      user  system elapsed
> >>         0       0       0
> >> length: 1000
> >>      user  system elapsed
> >>         0       0       0
> >> length: 10000
> >>      user  system elapsed
> >>      0.02    0.00    0.01
> >> length: 1e+05
> >>      user  system elapsed
> >>      0.21    0.00    0.20
> >> length: 1e+06
> >>      user  system elapsed
> >>      4.47    0.04    4.51
> >> length: 1e+07
> >>      user  system elapsed
> >>     62.04    0.22   62.26
> >>>
> >>
> >>
> >> output for R version 3.0.2 Patched (2013-10-27 r64116):
> >>> sessionInfo()
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>> sample.vec <-
> >> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> +   dump('df0', file='testdump')
> >> +   cat('length:', i, '\n')
> >> +   print(system.time(source('testdump', keep.source = FALSE,
> >> +                            encoding='')))
> >> + }
> >> length: 10
> >>      user  system elapsed
> >>         0       0       0
> >> length: 100
> >>      user  system elapsed
> >>         0       0       0
> >> length: 1000
> >>      user  system elapsed
> >>         0       0       0
> >> length: 10000
> >>      user  system elapsed
> >>      0.01    0.00    0.01
> >> length: 1e+05
> >>      user  system elapsed
> >>      0.36    0.06    0.42
> >> length: 1e+06
> >>      user  system elapsed
> >>      6.02    1.86    7.88
> >> length: 1e+07
> >>      user  system elapsed
> >>    388.63  176.42  566.41
> >>>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-
> in-
> >> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



More information about the R-help mailing list