[R] Systematic treatment of missing values
David Soloveichik
dsolov at caltech.edu
Tue May 30 10:34:02 CEST 2006
Thank you very much for your prompt reply and for adding the comments
to the help pages for match and ==. I think the source of my
confusion was that by looking at the current documentation (v 2.3.0)
I did not realize that matching is different from equality testing.
(Obviously in the case of using regular expressions, etc, it is
different, but I thought that when using plain "match" and %in%,
matching would be determined by ==.)
Also I did not mean for my first comment to sound like a criticism of
R for treating NAs inconsistently. Nonetheless I am still curious
why the particular choice was made that "match" (and therefore %in%)
acts differently from "==" with respect to NA's and NaN's (with the
default and the only implemented value of the "incomparables"
parameter)?
Thank you,
David
On May 28, 2006, at 1:10 AM, Prof Brian Ripley wrote:
> You start with very general comments, but only use one specific
> function, match (see ?"%in%", a help page entitled `value matching').
>
> Matching and equality are treated differently. By definition, NA
> matches NA and nothing else, and NaN matches NaN and nothing else.
> In comparisons, these values are not comparable.
>
> As you will have seen from the help page, match() has the expansion
> capacity for declaring values non-comparable. That has not been
> implemented for a decade and no one has supplied code to implement
> it, so it seems no want has much need of it.
>
> I have added notes to the help pages for match and == to say
> explicitly what matches and what is comparable. If the *Draft* R
> Language Definition were ever to be finished it would have such
> details: it already has a useful commentary.
>
> On Sat, 27 May 2006, David Soloveichik wrote:
>
>> I am wondering whether there is a well-accepted approach to handling
>> missing values (NA's) in a programming language such as R. For
>> example, most functions seem to propagate NA to the output when the
>> value of the missing entry could have mattered. In other words, most
>> functions are not willing to "take a stand" on what the missing value
>> was. However, some functions don't seem to do this. For example,
>>
>> > c(1,2,3,NA) %in% c(2,3)
>> [1] FALSE TRUE TRUE FALSE
>>
>> rather than: FALSE TRUE TRUE NA
>>
>>
>> Also, what is the logic of the following:
>> > c(1,2,3,NA) %in% c(2,3,NA)
>> [1] FALSE TRUE TRUE TRUE
>>
>> Why is the last output value TRUE? Why does R claim that the NA on
>> the left hand side of %in% is the same as the NA on the right hand
>> side of %in%?
>
> It does not: it reports that it *matches*. Please do read the help
> page bwofre posting, as the posting guide asked you to.
>
>> PLEASE do read the posting guide! http://www.R-project.org/posting-
>> guide.html
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list