[R] A comment about R:

Thu Jan 5 08:20:41 CET 2006

>> On Wed, 4 Jan 2006, Roger Bivand wrote:
>> > Could I ask for comments on:
>> >
>> > source(url("http://spatial.nhh.no/R/etc/capabilities.R"), echo=TRUE)
>> >
>> > as a reproduction of the Stata capabilities session? Both the t test 
>> > and
>> > the chi-square from our side point up oddities. I didn't succeed on
>> > putting fit lines on a grouped xyplot, so backed out to base graphics.
>> > This could be Swoven, possibly using the RweaveHTML driver.
>> >
>>
Excellent!  Although I will point out that the Stata summarize command is a 
little different than the R summary command.  The summarize command is a 
little more like:

 summarize <- function(x){
   obs=length(x)
   mn=mean(x)
   sd=sd(x)
   min=min(x)
   max=max(x)
  cat('obs \t Average \t Std. Dev. \t Min \t Max \n', 
obs,'\t',mn,'\t',sd,'\t',min,'\t',max,'\n')
 }

As a user of statistics rather than a statistician, I have to agree with the 
original author whose premise was that different statistical packages have 
different strengths.  I think the main basis for his comments on R were, 
reading between the lines, that he knew it mostly from talking to friends. 
Any statistical tool for those of us in the back rows is as easy as our 
mentor make it.  At my institution there is a paucity of good mentors, and I 
have found the learning curve equally steep for Stata 7 for which I have 
many, many volumes of documentation and R for which I have greatly benefited 
from several of the terrific contributed documentation and books already 
mentioned.

The original article was about SAS, Stata, and SPSS strengths for carrying 
out 'tradtional statistics'.  What are R's strengths?  Too numerous to 
mention in the hands of the right users.  However, I would point to things 
like the tools at the Bioconductor site  as a broad illustration of the 
nearly infinite flexibility and extensibility of R for specialized 
statistical tasks.  Does this mean that R is a poor tool to choose for the 
basic and traditional procedures?  Hardly!  (Well written documentation like 
John Fox's cars, Peter Dalgard's ISwR, and John Verzani's Simple R 
contributed documentation put introductory R statistical procedures within 
easy grasp of users.  I have found that non-statistics students rapidly 
catch on with 'problem-specific'  guidance once they overcome the lack of 
GUI.  (R-commander is certainly a solution there).  As the number of R 
mentors grows to rival SAS, Stata, and SPSS, the everyday tasks might even 
appear easier to new initiates than the corresponding syntax and thought 
processes in the other programs.

So, what are R's major weaknesses?  I do not think they are statistical. 
Rather, it is having 'mentors' who have gone before to do the type of 
analysis that you (the end user) wish to do, and who have graciously left 
behind a paper trail of how to syntactically address a specific statistical 
task.  There is a huge amount out there, but it is hard to find at the 
beginning.  [BTW: This listserve is of course a tremendous resource, and why 
should we not read the posting guide out of simple respect for those who 
have given us such a great resource.  I don't like getting flammed either, 
but darn it, sometimes I deserve it ;-).]

Finally, this thread has made me think back 3-4 years to when I  first 
discovered R.  The think that frustrated me the most in the early weeks was 
getting data into R.  It took me no time to learn to generate data with all 
kinds of distrbutions, no time to discover 'build in' datasets from the 
data() function, or to enter data a number at a time with the c() funtion. 
BUT HOW was I to get the datasets (spreadsheet, database) from my laboratory 
into R?  This somehow has been much easier to figure out in the other (often 
GUI) statistical environments I have used.  [Of course, I finally discovered 
the documentation for the foreign package and later learned about RODBC, and 
I was blown away by the flexibility available.

Well just the thoughts of one end user type...

Rob