[R] The hidden costs of GPL software?
Tim Cutts
tjrc at sanger.ac.uk
Thu Nov 18 11:27:02 CET 2004
On 17 Nov 2004, at 2:27 pm, Patrick Burns wrote:
> I think Ted Harding was on the mark when he said that it is the help
> system that needs enhancement. I can imagine a system that gets the
> user to the right function and then helps fill in the arguments; all
> of the
> time pointing them towards the command line rather than away from
> it.
I think this is spot on. My situation is that I am a scientist turned
system administrator, and R is a package which I am increasingly being
asked to install for the use of scientists at this Institute. I am by
no means a statistician; the statistics I learned in A-level maths
almost 20 years ago were as far as I got, and most of that I have
forgotten. But I like to have some understanding of the software
packages I am asked to support, so I've been looking at R with a view
to learning some of its more basic functions. It looks potentially
very useful to me anyway for summarising activity on the supercomputing
cluster that I run.
So I'm a newbie to R, armed with only a very basic knowledge of
statistics (I know the difference between a Normal and a Poisson
distribution at least, and with a bit of prodding could probably
remember a binomial distribution too). I'm an experienced programmer
in several languages, and a PhD-level scientist.
And yet I have still found R really quite hard to learn, and this is
principally because the on-line help is a reference manual. I'm sure
it's a fabulous resource if you're a statistician who uses R every day,
but for me it's not very helpful.
The R Intro PDF is good, but it would be nice if it were integrated
better, with hyperlinks to the reference documentation, or to other
parts of the introduction, for those platforms that support such things
(it looks like this was intended for MacOS X, which is the version I am
playing with for my own use, although the version I maintain for users
is on Linux [ and would be on Alpha/Tru64 too if I could get it to pass
its tests ]) but the on-line help link to the Intro on the Aqua R
version brings up a blank page, so I'm using the generic PDF document
instead.
I think the GUI question has nothing to do with the hidden costs of the
GPL, or otherwise. This is the age-old ease-of-use versus power and
capability argument.
I don't think a fancy GUI is necessary - the GUI aspects that have been
added to R on Mac OS X are sufficient. I get the impression that the
real power of R is the fact that really it's a programming language,
and should probably be treated and learned as such. Quite apart from
the fact that a GUI will necessarily be a somewhat restricted subset of
the total functionality, and a lot slower to use once you've taken the
effort to learn the software, I think there is another danger, which I
have already seen in other pieces of software in the bioinformatics
community. Users frequently run completely pointless analyses through
the GUI wrappers we provide. The users using the command line
interfaces typically do much more sensible things.
If you make a piece of software trivial for a user to use without
thinking about what they're doing, then the users won't think. I may
not know much about statistics, but what little I do know is that
understanding exactly what form of analysis or significance test is
required to be meaningful is a real skill that takes a lot of
experience to master. Having to perform that analysis with written
commands means that your method is recorded, and could be published,
and more importantly be checked and reproduced by other researchers.
It also gives you ample time to think about what you're doing, rather
than just bashing out a pretty graph which actually has no real meaning
whatsoever.
Any GUI to R could (and should) be able to store the command line
equivalent to what it has just done, to satisfy the reproducible
criterion above, but I suspect it could still lead to some pretty
shoddy work being done by careless and lazy scientists, and we get
enough of that already.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the R-help
mailing list