[Rd] Bug in tapply with factors containing NAs (PR#6672)
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Mon Mar 15 12:18:07 MET 2004
On Mon, 15 Mar 2004 george.leigh at dpi.qld.gov.au wrote:
> Full_Name: George Leigh
> Version: 1.8.1
> OS: Windows 2000
> Submission from: (NULL) (203.25.1.208)
>
>
> The following example gives the correct answer when the first argument of tapply
> is a numeric vector, but an incorrect answer when it is a factor. If the
> function used by tapply is "length", the type and contents of the first argument
> should make no difference, provided it has the same length as the second
> argument.
Not so:
> split(x, y)
$"1"
[1] 1
> split(y, y)
$"1"
[1] <NA> 1
Levels: 1
Note that as there is only one level, NA must be 1 in y, whereas it does
not have to be in x. So the answer for a factor in your problem is
definitely correct, if fortuitous.
R does the same as S in this example.
If there were more than one level in y, the issue is less clearcut.
Probably y[[k]] <- x[f == k] in split.default should be x[f %in% k]
Note too
z <- x; class(x) <- "foo"
> split(z, y)
$"1"
[1] NA 1
> x = c(NA, 1)
> > y = factor(x)
> > tapply(x, y, length)
> 1
> 1
> > tapply(y, y, length)
> 1
> 2
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list