[R] factor documentation issue
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Feb 28 06:56:35 CET 2007
Geoff Russell wrote:
> There is a warning in the documentation for ?factor (R version 2.3.0)
> as follows:
>
> " The interpretation of a factor depends on both the codes and the
> '"levels"' attribute. Be careful only to compare factors with the
> same set of levels (in the same order). In particular,
> 'as.numeric' applied to a factor is meaningless, and may happen by
> implicit coercion. To "revert" a factor 'f' to its original
> numeric values, 'as.numeric(levels(f))[f]' is recommended and
> slightly more efficient than 'as.numeric(as.character(f))'.
>
>
> But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
> always do anything useful.
>
> For example:
>
>
>> f<-factor(1:3,labels=c("A","B","C"))
>> f
>>
> [1] A B C
> Levels: A B C
>
>> as.numeric(f)
>>
> [1] 1 2 3
>
>> as.numeric(levels(f))[f]
>>
> [1] NA NA NA
> Warning message:
> NAs introduced by coercion
>
> And also,
>
>
>> f<-factor(1:3,labels=c(1,5,6))
>> f
>>
> [1] 1 5 6
> Levels: 1 5 6
>
>> as.numeric(f)
>>
> [1] 1 2 3
>
>> as.numeric(levels(f))[f]
>>
> [1] 1 5 6
>
> Is the documentation wrong, or is the code wrong, or have I missed
> something?
>
The documentation is somewhat unclear: The last sentence presupposes
that the factor was generated from numeric data, i.e. the
factor(c(7,9,13)) syndrome:
> f <- factor (c(7,9,13))
> f
[1] 7 9 13
Levels: 7 9 13
> as.numeric(f)
[1] 1 2 3
Also, the statement that as.numeric(f) is meaningless is a bit strong.
Probably should say "meaningless without knowledge of the levels and
their order". And you can actually compare factors with their levels in
different order:
> g <- factor (c("7",9,13))
> g
[1] 7 9 13
Levels: 13 7 9
> f==g
[1] TRUE TRUE TRUE
> as.numeric(f)==as.numeric(g)
[1] FALSE FALSE FALSE
Where you need to be careful is that if you do things like
sexsymbols <- c(16, 19)
plot(x, y, pch=sexsymbols[sex]),
then you should also do
legend(x0, y0, legend=levels(sex), pch=sexsymbols)
in order to be sure the symbols match the legend. (Notice that indexing
with [sex] implicitly coerces sex to numeric).
More information about the R-help
mailing list