# odd result of length() with factor arguments with NA (PR#354)

**ripley@stats.ox.ac.uk
**
ripley@stats.ox.ac.uk

*Wed, 1 Dec 1999 08:47:51 +0100 (MET)*

On Wed, 1 Dec 1999 rnassar@duke.edu wrote:
>* The following looks odd to me, but it may well be that I'm doing
*>* something I shouldn't:
*>*
*>* x <- c(rep("a",5),NA,rep("b",7))
*>* X <- as.factor(x)
*>* length(X)
*>* # [1] 13
*>* length(X[X=="a"])
*>* # [1] 6 I expected 5
*>* length(X[X=="b"])
*>* # [1] 8 I expected 7
*>* length(X[is.na(X)])
*>* # [1] 1 yes
*>* length(X[X=="d"])
*>* # [1] 1 but there is no "d"
*
This is correct: it is your expectations that are wrong. It helps to
look at the result.
>* X=="a"
* [1] TRUE TRUE TRUE TRUE TRUE NA FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE
>* X=="d"
* [1] FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE
>* X[X=="a"]
*[1] a a a a a NA
Levels: a b
What is going on is that the NA results are regarded as genuinely unknown.
So the sixth entry may or may not be "a": we just don't know. So the
only thing we can do is to record NA. And that applies when we subset too.
It might also be "d".
Now if you may meant NA to be a _level_ of the factor, use
>* X <- factor(x, exclude="")
*>* X[X=="a"]
*[1] a a a a a
Levels: a b NA
which might be what you expected. That treats NA as missing and known not
to be "a" nor "b".
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._