[R] R badly lags matlab on performance?

Philippe Grosjean phgrosjean at sciviews.org
Sun Jan 4 11:02:04 CET 2009


I wrote once the benchmark mentioned in Stefan's post (based on initial 
work by Stephan Steinhaus), and it is still available for those who 
would like to update it. Note that it is lacking some checking of the 
results to make sure that calculation is not only faster, but correct!

Now, I'll tell why I haven't update it, and you'll see it is connected 
with the current topic.

First, lack of time, for sure.

Second, this benchmark has always been very criticized by several people 
including from the R Core Team. Basically, this is just toy examples, 
disconnected from the reality. Even with better cases, benchmarks do not 
take into account the time needed to write your code for your particular 
application (from the question to the results).

I wrote this benchmark at a time when I overemphasized on the pure 
performances of the software, at a time I was looking for the best 
software I would choose as a tool for my future career.

Now, what's my choice, ten years later? Not two, not three software... 
but just ONE: R. I tend to do 95% of my calculations with R (the rest is 
ImageJ/Java). Indeed, this benchmark results (and the toy example of 
Ajay Shah, a <- a + 1) should be only considered very marginally, 
because what is important is how your software tool is performing in 
real application, not in simplistic toy examples.

R lays behind Matlab for pure arithmetic calculation... right! But R has 
a better object oriented approach, features more variable types (factor, 
for instance), and has a richer mechanism for metadata handling (col/row 
names, various other attributes, ...) that makes it richer to 
instanciate complex datasets or analyzes than Matlab. Of course, this 
has a small cost in performance.

As soon as you think your problem in a vectorized way, R is one of the 
best tool, I think, to go "from the question to the answer" in real 
situations. How could we quantify this? I would only see big contests 
where experts of each language would be presented real problems and one 
would measure the time needed to solve the problem,... Also, one should 
measure: the robustness, reusability, flexibility, "elegance" of the 
code produced (how to quantify these?). Such kind of contest between R, 
Matlab, Octave, Scilab, etc. is very unlikely to happen.

At the end, it is really a matter of personal feeling: you can make your 
own little contest by yourself: trying to solve a given problem in 
several software... and then decide which one you prefer. I think many 
people do/did this, and the still exponential growth of R use (at least, 
as it can be observed by the increasing number of CRAN R packages) is 
probably a good sign that R is probably one of the top performers when 
it comes to efficiency "from the question to the answer" in real 
problems, not just on toy little examples!

(sorry for been so long, I think I miss some interaction with the R 
community this time ;-)
Best,

Philippe

..............................................<°}))><........
  ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
  ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
  ) ) ) ) )   Mons-Hainaut University, Belgium
( ( ( ( (
..............................................................

Stefan Grosse wrote:
>> I don't have octave (on the same machine) to compare these with.
>> And I don't have MatLab at all. So I can't provide a comparison
>> on that front, I'm afraid.
>> Ted.
>>   
> 
> Just to add some timings, I was running 1000 repetitions (adding up to
> a=1001) on a notebook with core 2 duo T7200
> 
> R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
> R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
> 
> Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
> 
> Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
> 
> But I am not sure how representative this is with that very simple
> example. To compare Matlab speed with R a kind of benchmark suite is
> necessary. Like: http://www.sciviews.org/benchmark/index.html but that
> one is very old. I would guess that there did not change much: sometimes
> R is faster, sometimes not.
> 
> This difference between the Windows and Linux timing is probably not
> really relevant: when I was comparing the timings of my usual analysis
> there was no difference between the two operating systems. (count data
> and time series stuff)
> 
> Cheers
> Stefan
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list