[R] R badly lags matlab on performance?
luke at stat.uiowa.edu
luke at stat.uiowa.edu
Sun Jan 4 01:02:33 CET 2009
On Sat, 3 Jan 2009, Duncan Murdoch wrote:
> On 03/01/2009 1:37 PM, Ajay Shah wrote:
>>> As for jit and Ra, that was immediate reaction too but I found that jit
>>> does
>>> not help on your example. But I concur fully with what Ben said --- use
>>> the
>>> tool that is appropriate for the task at hand. If your task is running
>>> for
>>> loops, Matlab does it faster and you have Matlab, well then you should by
>>> all
>>> means use Matlab.
>>
>> A good chunk of statistical computation involves loops. We are all
>> happy R users. I was surprised to see that we are so far from matlab
>> in the crucial dimension of performance.
>>
>
> I don't know Matlab, but I think the thing that is slowing R down here is its
> generality. When you write
>
> a[i] <- a[i] + 1
>
> in R, it could potentially change the meaning of a, [, <-, and + on each step
> through the loop, so R looks them up again each time. I would guess that's
> not possible in Matlab, or perhaps Matlab has an optimizer that can recognize
> that in the context where the loop is being evaluated, those changes are
> known not to happen.
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
> It *would* be possible to write such an optimizer for
> R, and Luke Tierney's byte code compiler-in-progress might incorporate such a
> thing.
The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4. The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.
> For the difference in timing on the vectorized versions, I'd guess that
> Matlab uses a better compiler than gcc. It's also likely that R incorporates
> some unnecessary testing even in a case like this, because it's easier to
> maintain code that is obviously sane than it is to maintain code that may not
> be. R has a budget which is likely several orders of magnitude smaller than
> Mathworks has, so it makes sense to target our resources at more important
> issues than making fast things run a bit faster.
Another possibility is optimization setting tht may be higher and/or
more processor specific than those used by R.
We do handle the case where both arguments to + are scalar (i.e. of
length 1) separately but I don't recall if we do so for the
vector/scalar case also -- I suspect not as that would make the code
less maintainable for not a very substantial gain.
luke
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-help
mailing list