[R] Using R for Production - Discussion
santosh.srinivas at gmail.com
Tue Nov 2 05:04:54 CET 2010
This is an open-ended question.
Quite fascinated by the things I can do and the control I have on my
activities since I started using R.
I basically have been using this for analytical related work off my desktop.
My experience has been quite good and most issues where I need to
investigate and solve are typical items more related to data errors, format
corruption, etc... not necessarily "R" Related.
Complementing this with Python gives enough firepower to do lots of
production (analytical related activities) on the cloud (from my research I
see that every innovative technology provider seems to support Python ...
google, amazon, etc).
Question on using R for Production activities:
Q1) Does anyone have experience of using R-scripts etc ... for production
related activities. E.g. serving off a computational/ analytical /
simulation environment from a webportal with the analytical processing done
I've seen that most useful things for normal (not rocket science) business
(80-20 rule) can be done just as well in R in comparison with tools like
SAS, Matlab, etc.
Q2) I haven't tried the processing routines for much larger data-sets
assuming "size" is not a constraint nowadays.
I know that I should try out ... but any forewarnings would help. Is it
likely that something that works for my "desktop" dataset is quite as likely
to work when scaled up to a "cloud dataset"?
Assuming that I do the clearing out of unused objects, not running into
infinite loops, etc?
i.e. is there any problem with the "fundamental architecture of R itself"?
(like press articles often say)
Q3) There are big fans of the SAS, Matlab, Mathworks environments out there
.... does anyone have a comparison of how R fares.
>From my experience R is quite neat and low level ... so overheads should be
Most slowness comes due to lack of knowledge (see my code ... like using the
wrong structures, functions, loops, etc.) rather than something wrong with
the way R itself is.
Perhaps there is no "commercial" focus to enhance performance related issues
but my guess is that it is just matter of time till the community evolves
the language to score higher on that too.
And perhaps develops documentation to assist the challenge users with
"performance tips" (the ten commandments types)
Q4) You must have heard about the latest comment from James Goodnight of SAS
... "We haven't noticed that a lot. Most of our companies need industrial
strength software that has been tested, put through every possible scenario
or failure to make sure everything works correctly."
My "gut" is that random passionate geeks (playing part-time) do better
testing than a military of professionals ... (but I've no empirical evidence
I am not taking a side here (although I appreciate those who do!) .. but
looking for an objective reasoning.
More information about the R-help