[R] R badly lags matlab on performance?
Philippe Grosjean
phgrosjean at sciviews.org
Sun Jan 4 11:02:04 CET 2009
I wrote once the benchmark mentioned in Stefan's post (based on initial
work by Stephan Steinhaus), and it is still available for those who
would like to update it. Note that it is lacking some checking of the
results to make sure that calculation is not only faster, but correct!
Now, I'll tell why I haven't update it, and you'll see it is connected
with the current topic.
First, lack of time, for sure.
Second, this benchmark has always been very criticized by several people
including from the R Core Team. Basically, this is just toy examples,
disconnected from the reality. Even with better cases, benchmarks do not
take into account the time needed to write your code for your particular
application (from the question to the results).
I wrote this benchmark at a time when I overemphasized on the pure
performances of the software, at a time I was looking for the best
software I would choose as a tool for my future career.
Now, what's my choice, ten years later? Not two, not three software...
but just ONE: R. I tend to do 95% of my calculations with R (the rest is
ImageJ/Java). Indeed, this benchmark results (and the toy example of
Ajay Shah, a <- a + 1) should be only considered very marginally,
because what is important is how your software tool is performing in
real application, not in simplistic toy examples.
R lays behind Matlab for pure arithmetic calculation... right! But R has
a better object oriented approach, features more variable types (factor,
for instance), and has a richer mechanism for metadata handling (col/row
names, various other attributes, ...) that makes it richer to
instanciate complex datasets or analyzes than Matlab. Of course, this
has a small cost in performance.
As soon as you think your problem in a vectorized way, R is one of the
best tool, I think, to go "from the question to the answer" in real
situations. How could we quantify this? I would only see big contests
where experts of each language would be presented real problems and one
would measure the time needed to solve the problem,... Also, one should
measure: the robustness, reusability, flexibility, "elegance" of the
code produced (how to quantify these?). Such kind of contest between R,
Matlab, Octave, Scilab, etc. is very unlikely to happen.
At the end, it is really a matter of personal feeling: you can make your
own little contest by yourself: trying to solve a given problem in
several software... and then decide which one you prefer. I think many
people do/did this, and the still exponential growth of R use (at least,
as it can be observed by the increasing number of CRAN R packages) is
probably a good sign that R is probably one of the top performers when
it comes to efficiency "from the question to the answer" in real
problems, not just on toy little examples!
(sorry for been so long, I think I miss some interaction with the R
community this time ;-)
Best,
Philippe
..............................................<°}))><........
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Belgium
( ( ( ( (
..............................................................
Stefan Grosse wrote:
>> I don't have octave (on the same machine) to compare these with.
>> And I don't have MatLab at all. So I can't provide a comparison
>> on that front, I'm afraid.
>> Ted.
>>
>
> Just to add some timings, I was running 1000 repetitions (adding up to
> a=1001) on a notebook with core 2 duo T7200
>
> R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
> R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
>
> Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
>
> Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
>
> But I am not sure how representative this is with that very simple
> example. To compare Matlab speed with R a kind of benchmark suite is
> necessary. Like: http://www.sciviews.org/benchmark/index.html but that
> one is very old. I would guess that there did not change much: sometimes
> R is faster, sometimes not.
>
> This difference between the Windows and Linux timing is probably not
> really relevant: when I was comparing the timings of my usual analysis
> there was no difference between the two operating systems. (count data
> and time series stuff)
>
> Cheers
> Stefan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list