[R] The hidden costs of GPL software?

Tim Cutts tjrc at sanger.ac.uk
Thu Nov 18 11:27:02 CET 2004


On 17 Nov 2004, at 2:27 pm, Patrick Burns wrote:

> I think Ted Harding was on  the mark when he said that it is the help
> system that needs enhancement.  I can imagine a system that gets the
> user to the right function and then helps fill in the arguments; all 
> of the
> time pointing them towards the command line rather than away from
> it.

I think this is spot on.  My situation is that I am a scientist turned 
system administrator, and R is a package which I am increasingly being 
asked to install for the use of scientists at this Institute.  I am by 
no means a statistician;  the statistics I learned in A-level maths 
almost 20 years ago were as far as I got, and most of that I have 
forgotten.  But I like to have some understanding of the software 
packages I am asked to support, so I've been looking at R with a view 
to learning some of its more basic functions.  It looks potentially 
very useful to me anyway for summarising activity on the supercomputing 
cluster that I run.

So I'm a newbie to R, armed with only a very basic knowledge of 
statistics (I know the difference between a Normal and a Poisson 
distribution at least, and with a bit of prodding could probably 
remember a binomial distribution too).  I'm an experienced programmer 
in several languages, and a PhD-level scientist.

And yet I have still found R really quite hard to learn, and this is 
principally because the on-line help is a reference manual.  I'm sure 
it's a fabulous resource if you're a statistician who uses R every day, 
but for me it's not very helpful.

The R Intro PDF is good, but it would be nice if it were integrated 
better, with hyperlinks to the reference documentation, or to other 
parts of the introduction, for those platforms that support such things 
(it looks like this was intended for MacOS X, which is the version I am 
playing with for my own use, although the version I maintain for users 
is on Linux [ and would be on Alpha/Tru64 too if I could get it to pass 
its tests ]) but the on-line help link to the Intro on the Aqua R 
version brings up a blank page, so I'm using the generic PDF document 
instead.

I think the GUI question has nothing to do with the hidden costs of the 
GPL, or otherwise.  This is the age-old ease-of-use versus power and 
capability argument.

I don't think a fancy GUI is necessary - the GUI aspects that have been 
added to R on Mac OS X are sufficient.  I get the impression that the 
real power of R is the fact that really it's a programming language, 
and should probably be treated and learned as such.  Quite apart from 
the fact that a GUI will necessarily be a somewhat restricted subset of 
the total functionality, and a lot slower to use once you've taken the 
effort to learn the software, I think there is another danger, which I 
have already seen in other pieces of software in the bioinformatics 
community.  Users frequently run completely pointless analyses through 
the GUI wrappers we provide.  The users using the command line 
interfaces typically do much more sensible things.

If you make a piece of software trivial for a user to use without 
thinking about what they're doing, then the users won't think.  I may 
not know much about statistics, but what little I do know is that 
understanding exactly what form of analysis or significance test is 
required to be meaningful is a real skill that takes a lot of 
experience to master.   Having to perform that analysis with written 
commands means that your method is recorded, and could be published, 
and more importantly be checked and reproduced by other researchers.  
It also gives you ample time to think about what you're doing, rather 
than just bashing out a pretty graph which actually has no real meaning 
whatsoever.

Any GUI to R could (and should) be able to store the command line 
equivalent to what it has just done, to satisfy the reproducible 
criterion above, but I suspect it could still lead to some pretty 
shoddy work being done by careless and lazy scientists, and we get 
enough of that already.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233




More information about the R-help mailing list