[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

William Dunlap wdunlap at tibco.com
Wed Oct 30 20:11:33 CET 2013


I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data.  dump/source has never been an efficient
way of transferring data between different R session, but it is much worse
now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.

       n elapsed-2.15.2 elapsed-3.0.2
    2048          0.003         0.018
    4096          0.006         0.065
    8192          0.013         0.254
   16384          0.025         1.067
   32768          0.050         4.114
   65536          0.100        16.236
  131072          0.219        66.013
  262144          0.808       291.883
  524288          2.022      1285.265
 1048576          4.918            NA
 2097152          9.857            NA
 4194304         22.916            NA
 8388608         49.671            NA
16777216        101.042            NA
33554432        512.719            NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
was:
  test <- function(n = 2^(11:25))
  {
      tf <- tempfile()
      on.exit(unlink(tf))
      t(sapply(n, function(n){
          dput(log(seq_len(n)), file=tf)
          print(c(n=n, system.time(parse(file=tf))[1:3]))
      }))
  }

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Carl Witthoft
> Sent: Wednesday, October 30, 2013 5:29 AM
> To: r-help at r-project.org
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
> 
> Did you run the identical code on the identical machine, and did you verify
> there were no other tasks running which might have limited the RAM available
> to R?  And equally important, did you run these tests in the reverse order
> (in case R was storing large objects from the first run, thus chewing up
> RAM)?
> 
> 
> 
> Dear All,
> 
> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> In the example below I observe e.g. for a data.frame with 10^7 rows the
> following timings:
> 
> R version 2.15.2 Patched (2012-11-29 r61184)
> length: 1e+07
>     user  system elapsed
>    62.04    0.22   62.26
> 
> R version 3.0.2 Patched (2013-10-27 r64116)
> length: 1e+07
>     user  system elapsed
>   388.63  176.42  566.41
> 
> Is there a way to speed R version 3.0.2 up to the performance of R
> version 2.15.2?
> 
> best regards,
> 
> Heinz Tüchler
> 
> 
> example:
> sessionInfo()
> sample.vec <-
>    c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>      'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
> 
> for(i in dmp.size) {
>    df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>    dump('df0', file='testdump')
>    cat('length:', i, '\n')
>    print(system.time(source('testdump', keep.source = FALSE,
>                             encoding='')))
> }
> 
> output for R version 2.15.2 Patched (2012-11-29 r61184):
> > sessionInfo()
> R version 2.15.2 Patched (2012-11-29 r61184)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> > sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>     user  system elapsed
>        0       0       0
> length: 100
>     user  system elapsed
>        0       0       0
> length: 1000
>     user  system elapsed
>        0       0       0
> length: 10000
>     user  system elapsed
>     0.02    0.00    0.01
> length: 1e+05
>     user  system elapsed
>     0.21    0.00    0.20
> length: 1e+06
>     user  system elapsed
>     4.47    0.04    4.51
> length: 1e+07
>     user  system elapsed
>    62.04    0.22   62.26
> >
> 
> 
> output for R version 3.0.2 Patched (2013-10-27 r64116):
> > sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> > sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>     user  system elapsed
>        0       0       0
> length: 100
>     user  system elapsed
>        0       0       0
> length: 1000
>     user  system elapsed
>        0       0       0
> length: 10000
>     user  system elapsed
>     0.01    0.00    0.01
> length: 1e+05
>     user  system elapsed
>     0.36    0.06    0.42
> length: 1e+06
>     user  system elapsed
>     6.02    1.86    7.88
> length: 1e+07
>     user  system elapsed
>   388.63  176.42  566.41
> >
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list