[R] Inconsistence in specifying action for missing data
Martin Maechler
maechler at stat.math.ethz.ch
Sat Sep 3 23:50:47 CEST 2005
>>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>> on Sat, 03 Sep 2005 11:40:18 -0400 writes:
Duncan> John Sorkin wrote:
>> A question for R (and perhaps S and SPlus) historians.
>>
>> Does anyone know the reason for the inconsistency in the
>> way that the action that should be taken when data are
>> missing is specified? There are several variants,
>> na.action, na.omit, "T", TRUE, etc. I know that a foolish
>> consistency is the hobgoblin of a small mind, but
>> consistency can make things easier.
>>
>> My question is not meant as a complaint. I very much
>> admire the R development team. I simply am curious.
Duncan> R and S have been developed by lots of people, over
Duncan> a long time. I think that's it.
yes, but there's a bit more to it.
First, the question was "wrong" (don't you just hate such an answer?):
A more interesting question would have asked why there was
'na.rm = {TRUE, FALSE}'
on one hand and
'na.action = {na.omit, na.replace, .....}'
on the other hand,
since only these two appear as function *arguments*
{at least in `decent' S and R functions}.
There, the answer has at least two parts:
- First, for some functionalities, na.rm = TRUE/FALSE is the
only thing that makes sense, so why should you have to use
something more complicated?
- IIRC, 'na.rm' has been much earlier (S version 2),
than 'na.action' (S version 3; with na.replace much later IIRC);
na.action was really becoming relevant only when thinking
about model fitting and non-trivial missing value treatment.
Martin Maechler, ETH Zurich
More information about the R-help
mailing list