[R] R badly lags matlab on performance? -- Define performance, please.
gunter.berton at gene.com
Sun Jan 4 21:03:01 CET 2009
Merely my opinions, of course ...
Just to amplify a little on Philippe's remarks by paraphrasing comments made
many times on this list before. In a galaxy far away a long time ago ...
John Chambers and his Bell Labs colleagues -- and subsequently R&R (Ross
Ihaka and Robert Gentleman) and R's Core Team Developers -- made the
decision to develop a language/software for data analysis, data graphics and
statistics. Recognizing that "most" tasks within this arena were for
"one-off" custom problems rather than repetitive "production" applications,
they emphasized flexibility, ease of use and relatively straightforward
extensibility. While I'm sure that they did not ignore performance, it was
not the primary consideration (Chambers, et al's Blue Book speaks to these
issues much more eloquently; I think it should be required reading _BEFORE_
one launches into criticism). As has been frequently mentioned, they knew
that there are two "outs" for such matters: Moore's Law and the ability to
easily incorporate customized C code into R. I submit that the data bear out
the overwhelming wisdom of their choice.
This is not to that R is perfect: there are certainly times when performance
is inadequate, and design or implementation could have been (or be)
improved. But no one bats a thousand (baseball idiom): as Philippe said, for
many (maybe most?) of us R is both awesome and indispensable!
For me the real challenge is: what's next? R/S is so blazingly successful
that it seems to extingush the need for continuing improvement(the demise of
Luke Tierney's X-Lisp Stat is an example): what's the next step in the
sequence IMSL --> SAS ---> S/R --> ?? . But hopefully this is merely my
ignorance speaking, and smart folks are already working on it.
Regards to all,
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Philippe Grosjean
Sent: Sunday, January 04, 2009 2:02 AM
To: Stefan Grosse
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] R badly lags matlab on performance?
I wrote once the benchmark mentioned in Stefan's post (based on initial
work by Stephan Steinhaus), and it is still available for those who
would like to update it. Note that it is lacking some checking of the
results to make sure that calculation is not only faster, but correct!
Now, I'll tell why I haven't update it, and you'll see it is connected
with the current topic.
First, lack of time, for sure.
Second, this benchmark has always been very criticized by several people
including from the R Core Team. Basically, this is just toy examples,
disconnected from the reality. Even with better cases, benchmarks do not
take into account the time needed to write your code for your particular
application (from the question to the results).
I wrote this benchmark at a time when I overemphasized on the pure
performances of the software, at a time I was looking for the best
software I would choose as a tool for my future career.
Now, what's my choice, ten years later? Not two, not three software...
but just ONE: R. I tend to do 95% of my calculations with R (the rest is
ImageJ/Java). Indeed, this benchmark results (and the toy example of
Ajay Shah, a <- a + 1) should be only considered very marginally,
because what is important is how your software tool is performing in
real application, not in simplistic toy examples.
R lays behind Matlab for pure arithmetic calculation... right! But R has
a better object oriented approach, features more variable types (factor,
for instance), and has a richer mechanism for metadata handling (col/row
names, various other attributes, ...) that makes it richer to
instanciate complex datasets or analyzes than Matlab. Of course, this
has a small cost in performance.
As soon as you think your problem in a vectorized way, R is one of the
best tool, I think, to go "from the question to the answer" in real
situations. How could we quantify this? I would only see big contests
where experts of each language would be presented real problems and one
would measure the time needed to solve the problem,... Also, one should
measure: the robustness, reusability, flexibility, "elegance" of the
code produced (how to quantify these?). Such kind of contest between R,
Matlab, Octave, Scilab, etc. is very unlikely to happen.
At the end, it is really a matter of personal feeling: you can make your
own little contest by yourself: trying to solve a given problem in
several software... and then decide which one you prefer. I think many
people do/did this, and the still exponential growth of R use (at least,
as it can be observed by the increasing number of CRAN R packages) is
probably a good sign that R is probably one of the top performers when
it comes to efficiency "from the question to the answer" in real
problems, not just on toy little examples!
(sorry for been so long, I think I miss some interaction with the R
community this time ;-)
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Belgium
( ( ( ( (
Stefan Grosse wrote:
>> I don't have octave (on the same machine) to compare these with.
>> And I don't have MatLab at all. So I can't provide a comparison
>> on that front, I'm afraid.
> Just to add some timings, I was running 1000 repetitions (adding up to
> a=1001) on a notebook with core 2 duo T7200
> R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
> R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
> Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
> Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
> But I am not sure how representative this is with that very simple
> example. To compare Matlab speed with R a kind of benchmark suite is
> necessary. Like: http://www.sciviews.org/benchmark/index.html but that
> one is very old. I would guess that there did not change much: sometimes
> R is faster, sometimes not.
> This difference between the Windows and Linux timing is probably not
> really relevant: when I was comparing the timings of my usual analysis
> there was no difference between the two operating systems. (count data
> and time series stuff)
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help