[Rd] Dropping unused levels of a factor that has "NA" as a level
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jul 19 12:17:13 CEST 2006
It is history:
r16144 | ripley | 2001-09-28 19:40:28 +0100 (Fri, 28 Sep 2001) | 2 lines
add is.na<-, distinguish NA level and NA codes in factors
so predates having NA character strings distinct from "NA".
On Tue, 11 Jul 2006, Brahm, David wrote:
> I mentioned this in R-help on April 28:
> <https://stat.ethz.ch/pipermail/r-help/2006-April/104595.html>
>
> | as.character.factor contains this line (where cx=levels(x)[x]):
> | if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>"
> |
> | Is it possible that this is no longer the desired behavior? These
> | two results don't seem very consistent:
> |
> | > as.character(as.factor(c("AB", "CD", NA)))
> | [1] "AB" "CD" NA
> | > is.na(.Last.value)[3]
> | [1] TRUE
> |
> | > as.character(as.factor(c("NA", "CD", NA)))
> | [1] "NA" "CD" "<NA>"
> | > is.na(.Last.value)[3]
> | [1] FALSE
> |
> | I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior
> | is new (maybe since character NA's were introduced?).
> |
> | -- David Brahm (brahm at alum.mit.edu)
>
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
> Sent: Tuesday, July 11, 2006 5:59 PM
> To: J. Hosking
> Cc: r-devel at stat.math.ethz.ch
> Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level
>
> "J. Hosking" <jh910 at juno.com> writes:
>
> > Is this a bug?
> >
> > > f1 <- factor(c("a", NA), levels = c("a", "NA") )
> > > f2 <- f1[, drop = TRUE]
> > > f2
> > [1] a <NA>
> > Levels: a <NA>
> >
> > I would have expected f2 to have only one level, "a". It seems
> > to me that the code in [.factor does not follow the advice in
> > help("factor") on how to set factor codes to be missing when
> > "NA" is a level of the factor.
>
>
> Something odd is going on, that's for sure...
>
> The problem is also there with factor(f1). And the logic in
> as.character.factor seems to be at the root of it:
>
> > as.character.factor
> function (x, ...)
> {
> cx <- levels(x)[x]
> if ("NA" %in% levels(x))
> cx[is.na(x)] <- "<NA>"
> cx
> }
>
> This looks like something from before we had character NA values. I
> wonder if it is a mistake or there could actually be a reason to
> keep it.
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list