[Rd] R as a scripting engine
Dirk Eddelbuettel
edd at debian.org
Tue Jan 13 19:25:06 CET 2009
Hi Simon,
On 13 January 2009 at 12:36, Simon Urbanek wrote:
| Oh, well, now that the post count is growing I guess I have to
| respond ;).
|
| On Jan 11, 2009, at 15:50 , Dirk Eddelbuettel wrote:
|
| >
| > On 11 January 2009 at 20:18, Prof Brian Ripley wrote:
| > | Those of you tracking R development will have noticed that we are
| > | moving towards using R as a scripting engine.
| > [...]
| > | Reasons:
| > |
| > | - it is platform-independent and needs no other tools installed.
| > [...]
| > | - it is fast.
| > [...]
| >
| > Indeed. I really like working with r scripts.
| >
| > And littler by Horner and Eddelbuettel is faster than Rscript -- see
| > eg the
| > scripts tests/timing.sh and tests/timing2.sh in the SVN archive /
| > littler
| > tarballs (and the results below for illustration).
| >
|
| Well, if we enter that territory then rcmd from Rserve is much faster
| than littler:
|
| --- GNU bc doing the addition 10 times
| real 0m0.029s
| user 0m0.009s
| sys 0m0.028s
|
| --- rcmd doing the addition 10 times
| real 0m0.090s
| user 0m0.010s
| sys 0m0.022s
|
| --- our r doing the addition 10 times
| real 0m0.294s
| user 0m0.199s
| sys 0m0.091s
|
| --- GNU R's Rscript doing the addition 10 times
| real 0m1.626s
| user 0m1.241s
| sys 0m0.357s
|
| --- GNU R doing the addition 10 times
| real 0m2.883s
| user 0m2.424s
| sys 0m0.426s
|
| (littler and timig.sh script from http://littler.googlecode.com/svn/trunk
| with loopRcmd added)
|
|
| Yes, the comparison is unfair, but that applies to littler and Rscript
| as well.
It's a fair point. The test in timing.sh skews towards startup and
initialization costs, and Rserve ought to win against others starting
repeatedly times if it only starts once :)
| Once you start making direct comparisons the story is a bit
| different:
|
| --- our r doing the addition 10 times
| real 0m0.297s
| user 0m0.200s
| sys 0m0.091s
|
| --- GNU R's Rscript doing the addition 10 times
| real 0m0.390s
| user 0m0.219s
| sys 0m0.163s
|
| (Rscript is now run with R_DEFAULT_PACKAGES=NULL since that is what
| littler really does).
I think that is incorrect. For littler, the code is shared between autoloads.R /
autoloads.h and littler.c. We always load
dp <- getOption("defaultPackages")
and hence currently (from autoloads.h in your build directory, data read by
autoloads() in littler.c)
char *pack[] = {
"datasets",
"utils",
"grDevices",
"graphics",
"stats",
"methods"
};
so I believe the following results could be 'improved' on littler's side as
well if I chose to ignore load of default packages.
| .. and for timing2.sh (again, comparing Rcmd is technically a bit
| unfair, but the net effect for the user is good):
|
| --- rcmd calling summary() 20 times
| real 0m0.439s
| user 0m0.019s
| sys 0m0.045s
|
| --- our r calling summary() 20 times
| real 0m2.619s
| user 0m2.089s
| sys 0m0.476s
|
| --- GNU R's Rscript calling summary() 20 times
| real 0m2.435s
| user 0m1.793s
| sys 0m0.590s
|
| --- GNU R calling summary() 20 times
| real 0m5.789s
| user 0m4.892s
| sys 0m0.829s
|
| so in fact Rscript is here faster than littleR!
|
| (R 2.8.1 32-bit, Mac OS X 10.5.6, bash shell (due to echo bug), Quad
| 2.66GHz Xeon, littler from SVN, rcmd from SVN, Rserve 0.5-2)
|
|
| > We should still appreciate it you could finally acknowledge
| > existence of littler it in the R / Rscript documentation. You are
| > not doing users any service by pretending it doesn't exist.
| >
|
| I don't think anyone is pretending anything. If users want littler
| specifically, they will find it, but I don't see why it should have
| anything to do with the R documentation - it's not part of R ...
All I suggested in the past was a nod towards littler's existence. Littler
did come first, and (under an 'apples to apples' comparison) still starts
faster (as it does its work differently making it e.g. less easily portable
to Windows).
It is after all fairly common to credit previous related work.
| Cheers,
| S
|
| PS: No, I'm not advertising Rserve here - it has its own place, but I
| wouldn't really use it for running random scripts. I have added it to
| the mix just to show that there is always a tradeoff, so it's
| important to know what is compared in benchmarks...
I have gotten really used to always letting emacs run and to start client
session without delay (besides being able to continue working remotely in
existing sessions). In that sense a hybrid command-line version of something
like little that talks to an Rserve instance is not a bad idea at all -- in
particular if speed was everything.
Dirk
|
|
| > That said, we are not (yet ?) building r for Windows, and I
| > appreciate that Rscript is available there. Maintenance and use of
| > R will be easier with a consistent set of tools. This is a good move.
| >
| > Dirk
| >
| >
| > edd at ron:~/svn/littler/tests> ./timing.sh
| >
| > --- GNU bc doing the addition 10 times
| > real 0m0.028s
| > user 0m0.004s
| > sys 0m0.012s
| >
| > --- our r doing the addition 10 times
| > real 0m0.400s
| > user 0m0.308s
| > sys 0m0.052s
| >
| > --- GNU R's Rscript doing the addition 10 times
| > real 0m2.077s
| > user 0m1.832s
| > sys 0m0.204s
| >
| > --- GNU R doing the addition 10 times
| > real 0m3.974s
| > user 0m3.728s
| > sys 0m0.228s
| > edd at ron:~/svn/littler/tests> ./timing2.sh
| >
| > --- our r calling summary() 20 times
| > real 0m3.261s
| > user 0m2.976s
| > sys 0m0.240s
| >
| > --- GNU R's Rscript calling summary() 20 times
| > real 0m4.164s
| > user 0m3.624s
| > sys 0m0.548s
| >
| > --- GNU R calling summary() 20 times
| > real 0m8.087s
| > user 0m7.552s
| > sys 0m0.492s
| > edd at ron:~/svn/littler/tests>
| >
| >
| > --
| > Three out of two people have difficulties with fractions.
| >
| > ______________________________________________
| > R-devel at r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
| >
| >
|
--
Three out of two people have difficulties with fractions.
More information about the R-devel
mailing list