[Rd] Bug in tapply with factors containing NAs (PR#6672)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Mon Mar 15 12:20:22 MET 2004
george.leigh at dpi.qld.gov.au writes:
> Full_Name: George Leigh
> Version: 1.8.1
> OS: Windows 2000
> Submission from: (NULL) (203.25.1.208)
>
>
> The following example gives the correct answer when the first argument of tapply
> is a numeric vector, but an incorrect answer when it is a factor. If the
> function used by tapply is "length", the type and contents of the first argument
> should make no difference, provided it has the same length as the second
> argument.
>
> > x = c(NA, 1)
> > y = factor(x)
> > tapply(x, y, length)
> 1
> 1
> > tapply(y, y, length)
> 1
> 2
> >
The core of this is that
> split(y,y)
$"1"
[1] <NA> 1
Levels: 1
> split(x,y)
$"1"
[1] 1
which in turn comes from the innards of split.default:
...
if (is.null(attr(x, "class")) && is.null(names(x)))
return(.Internal(split(x, f)))
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf
for (k in lf) y[[k]] <- x[f == k]
y
Factors have a class attribute, so you don't use the internal code in
that case and
> y[y=="1"]
[1] <NA> 1
Levels: 1
I think the line in split.default needs to read
for (k in lf) y[[k]] <- x[!is.na(f) & f == k]
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list