[R] R design (was "Variable passed to function not used in function in select)
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Nov 11 17:12:03 CET 2008
On 11/11/2008 10:28 AM, Terry Therneau wrote:
> I've read the back and forth this morning, and I have to side with Vince.
>
> 1. Functions that re-interpret their arguments are very dangerous. The
> original question involved a well formed call to a function, which returned the
> wrong answer. Bug, design flaw, whatever -- it's a mistake and the best choice
> would be to fix it.
> I only consider such behavior in 2 cases:
> a. when the function is almost never, ever, called from anything but the
> top level. help() is the only example I can think of.
> b. to create a label from an argument, as in plot, but the argument
> itself is left alone to work as it should.
There's another major use for this: model formulas. I like to be able
to write lm(y ~ ., data=df), and I'd really hate to have to evaluate all
the terms in a model formula explicitly.
> One possible fix for subset: first treat the argument formally, and only if that
> simple interpretation fails try the more 'clever' interpretations. Whether this
> is doable or not I can't say.
> 2. The documentation of subset is not in any way clear. I would never have
> been able to diagnose or work around this bug. The issues are very subtle.
> I quite often see "it's in the manual so we bear no blame" as an argument on
> this list. We all need to remember that our view of what we are particularly
> close to is a distorted one -- I for instance think that everything about the
> survival package is crystal clear --- and be particularly open to concerns that
> something is opaque or subtle.
>
> 3. I've heavily used perhaps 20 computing languages in my life. I found S to
> be a refreshing revalation (referring to S of the 1988 Blue manual) precisely
> because it was completely functional. Once I got used to it, this feature made
> it so much more useful, extensible, understandable than other things I'd used.
I don't know your definition of "completely functional", but I don't
think S and R have ever been. It has always been possible to refer to
non-local variables within a function (and their meaning is different
between S and R, but I think R tends to be a bit more functional in
this), to make super-assignments, to do lots of things that have side
effects.
> R is becoming less and less a functional language (hidden functions and
> dependencies with environments for one), I quite often cannot figure out either
> exactly what a function calls or how to get it to stop doing it.
I don't know what you mean here. Are you talking about recent changes?
(Which ones?) Or are you talking about older things, like namespaces?
Or closures, which have been in R from the beginning (and which are
part of why I'd call it more functional than S)?
I am not sure
> we have gained with each choice of "convenience" or sophistication over
> functional purity. I want "scan(file=myfile)" to continue to return "variable
> myfile not found" when I forget the quotes.
R allows a lot of flexibility in how arguments are handled, and there's
been some experimentation with different variations. Remember that R is
partly a laboratory in which people are trying to invent new ways of
doing statistical computing, and also remember that R (including its
contributed packages) has hundreds of authors, not all of whom agree on
the best way to do things. The benefit of this is that more stuff gets
done: I'm not forced to adopt your ideas of The Right Way to Do Things,
so I can get down to coding in the way I like. The disadvantage is that
things can be inconsistent, so people are forced to read the
documentation, and the documentation is always imperfect.
>
> I am stumped by the R results I get too often, and I'm not a novice. That
> said, good design is hard. I spend a lot of time on that aspect in the survival
> package and there are still bits where the 'right' way is only clear after
> several years experience. I do occassionaly make non-backwards compatable
> changes. The R core team has done an amazing job on the whole.
If I'm not mistaken, you are still an S user as well as an R user, and
this is a bit of a disadvantage: at a fundamental level, they are
different languages, though they look superficially similar. I haven't
used S in quite a few years, so I expect I'd be stumped by the results I
got there in a lot of cases. I think that in the main R is a simpler,
easier language to understand, but there are certainly bits and pieces
of it where it is not easy.
> And let's not shoot the bearers of bad news.
I think we can discuss what's good and what's bad about the language
without bringing out the guns or insults.
Duncan Murdoch
More information about the R-help
mailing list