[R] two questions for R beginner

Paul Hiemstra p.hiemstra at geo.uu.nl
Tue Mar 2 15:01:05 CET 2010

Brandon Zicha wrote:
> Hey Paul,
Hey Brandon, (adding R-help in the cc)

I agree with you that the documentation of R could be better, especially 
with more examples in code showing not only the common cases, but also 
more esoteric cases. It would be great if everyone invested a lot of 
time to write awesome documentation, but this is not the case. I just 
objected to the tone (I tought :)) I spotted. Some more comments are inline:
> Accepting the main point of my post - that the often VERY incomplete 
> help files appended to packages can be a major stumbling block for 
> getting up and running in R - I take your point.  I probably went a 
> bit to far with my language there.
> I would point out though that a great many parts of research (like 
> writing a bibliography - or searching for citations of any kind 
> usually) aren't much fun, but are an important part of research 
> related work.  Likewise, complete documentation (by which I hardly 
> mean a paper - looking at STATA help files as a minimum would be a 
> good start) is part of programming.  I agree that one needs to employ 
> some level of judgement, otherwise you will get helpfile that says 
> "First turn on the computer... then click the 'R' Icon...."  But, I 
> have myself created one or two STATA functions that I have put up for 
> public use - so I know how not fun, but necessary complete 
> documentation is.  Further, I didn't say that writing documentation 
> doesn't take time.  Everything takes time. My point was that relative 
> to actually creating the application - writing more complete 
> documentation takes very little time. If one invests the time to do 
> the 'fun' stuff of writing a new package for R, it seems reasonable 
> that taking the (proportionately) little time to write a nicer help 
> file would be the most 'professional' thing to do.  But, this could be 
> my illusion that all researchers seem themselves as professionals - 
> rather than an anarchic egoistic enclave of independent 
> self-interested paper producers.
This is what scientists get judged upon, not on how much software they 
publish and how good their documentation is. Furthermore, it is quite 
hard for a hardcore R programmer to judge what people find har about 
their software.
> I am notorious for assuming greater standards as an acceptable 'norm' 
> than my community at large :-)  Furthermore, you are absolutely right 
> that my standards are apparently even to high for many commercial 
> applications!  R help is sometimes downright good!
> So, if I accept that I am demanding S.O.B. and tone down my thoughts 
> of proper documentation and professionalism and adopt the (probably 
> more) reasonable perspective you do at the end of  "well, this is the 
> world we live in... and come on it's free" I totally agree that I 
> probably went too far!  But, better yet, I think that this observation 
> you make suggests a solution: Perhaps R could use a more integrated 
> and organized open source help system. I can think of a few 
> possibilities - the easiest being a wiki version of R help.  This way 
> users could add useful information to help files - such as more 
> examples, tricks, tips, and known problems.  This would take advantage 
> of the open source, free, user-community centered aspects of R, and 
> permit those with an interest in helping beginners to post notes for 
> beginners - on the help files.  I know that if such a wiki existed I 
> would have posted my recent example of constrain optimization I just 
> did recently.   It wouldn't be too difficult to add a function 
> wikihelp(X) that would open the wiki help page rather than the 
> standard help documentation.  Currently, help on any given command is 
> scattered all over help fora all about the web.  A central, indexed, 
> and easily referenced help system might be a solution.  Heck, such a 
> system could go a step further and link R-help listserv archives by 
> command thus centralizing and integrating the open-source user-built 
> information resource of the listserv into help().  How many e-mails to 
> this listserv begin with 'I just spent a few hours cruising the help 
> forums related to 'X' and couldn't find an answer.'
Sounds like a good addition, allowing people to add to the documentation 
as they see fit. There is ofcourse the R wiki, but this is not widely 
used and not firmly embedded into R itself. But how would we keep such a 
system you propose manageable, preventing it from becoming an enormous 
mess. Maybe some kind of moderation?
> I note that STATA has all their help files for the latest version of 
> stata available on the web (http://www.stata.com/help.cgi?contents).  
> How difficult would a similar system - only with R, editable and with 
> links to supplementary information - be to set up?  I can't imagine it 
> would be horribly expensive in terms of set up costs.
A problem is that there is no company that markets R that could set this 
up, the community is much looser, much more open source. Probably the R 
core team would be the closest thing we have.
> What do you think?
> Best,
> Brandon Z
> On Mar 2, 2010, at 1:16 PM, Paul Hiemstra wrote:
>> Brandon Zicha wrote:
>>>>>> What were your biggest misconceptions or
>>>>>> stumbling blocks to getting up and running
>>>>>> with R?
>>> Easy.  I terms of materials I have been unable to find good books 
>>> that introduce users to R from the perspective of someone familiar 
>>> only with packages like SPSS or STATA, or not familiar with 
>>> statistics packages at all.  Even introduction texts use jargon 
>>> without introducing it.
>>> I think that R-help files should be more thorough than they are, and 
>>> contain more examples.  I thought that STATA help files were 
>>> sparse!  The notion that 'R is a user community and thus they do 
>>> this in their spare time' is no excuse for those creating new tools 
>>> for R not developing complete help files.  It doesn't take that much 
>>> time relative to actually creating the new function.
>> Hi Brandon,
>> I would disagree with your point that documentation doesn't take much 
>> time. Writing documentation that is suitable for both the advanced 
>> user (being a reference, and thus preferably short) and the beginning 
>> user (being sort of a tutorial, and thus prefererably longer) is 
>> quite a challenge, comparable to writing a good paper. Apart from the 
>> fact that it takes quite a while, it is also not much fun. Often 
>> people develop packages for their own research and put the software 
>> online so others can benefit, they don;t need the documentation 
>> themselves and don't get paid to write the documentation.
>> So saying 'it's no excuse' really goes too far in my view. R is free, 
>> you did not pay several thousands of euros giving you the right for 
>> good support. Even the support is free through the mailing list. You 
>> can get a paid version of R at Revelution Computing. Then you can 
>> call them if there are problems. I'm not meaning to offend anybody, 
>> but I didn't agree with "is no excuse for those creating new tools 
>> for R not developing complete help files".  Partly the strength of R 
>> is in the open source, but sometimes, as with documentation, this can 
>> bite you. But I think the R docs aren't that bad, I've seen 
>> proprietary software that a worse job than R.
>> my 2euro on the subject :),
>> Cheers,
>> Paul
>>> In terms of actual R use - creating, using, and manipulating data 
>>> are the biggest frustration for those of the 'spreadsheet 
>>> generation'.  I get the impression that one needs to not merely 
>>> understand, but be fully fluent in the jargon of matrix mathematics 
>>> to even know what is going on half the time.  I find myself - even 
>>> now - using 'rules of thumb' that 'seemed to work' rather than fully 
>>> understanding what I am doing.  It is particularly discouraging when 
>>> many of those 'intro books' suggest using something besides R for 
>>> data manipulation - how clumsy is that!?
>>> I find the actual programming syntax itself is the easiest part to 
>>> master.  It is certainly more flexible - but without a particularly 
>>> sufficient increase in complexity - than trying to write script in 
>>> SPSS and STATA.
>>> Brandon Zicha
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> -- 
>> Drs. Paul Hiemstra
>> Department of Physical Geography
>> Faculty of Geosciences
>> University of Utrecht
>> Heidelberglaan 2
>> P.O. Box 80.115
>> 3508 TC Utrecht
>> Phone:  +3130 274 3113 Mon-Tue
>> Phone:  +3130 253 5773 Wed-Fri
>> http://intamap.geo.uu.nl/~paul

Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri

More information about the R-help mailing list