[R] factor documentation issue

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Feb 28 06:56:35 CET 2007


Geoff Russell wrote:
> There is a warning in the documentation for ?factor  (R version 2.3.0)
> as follows:
>
> " The interpretation of a factor depends on both the codes and the
>   '"levels"' attribute.  Be careful only to compare factors with the
>   same set of levels (in the same order).  In particular,
>   'as.numeric' applied to a factor is meaningless, and may happen by
>   implicit coercion.  To "revert" a factor 'f' to its original
>   numeric values, 'as.numeric(levels(f))[f]' is recommended and
>   slightly more efficient than 'as.numeric(as.character(f))'.
>
>
> But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
> always do anything useful.
>
> For example:
>
>   
>> f<-factor(1:3,labels=c("A","B","C"))
>> f
>>     
> [1] A B C
> Levels: A B C
>   
>> as.numeric(f)
>>     
> [1] 1 2 3
>   
>> as.numeric(levels(f))[f]
>>     
> [1] NA NA NA
> Warning message:
> NAs introduced by coercion
>
> And also,
>
>   
>> f<-factor(1:3,labels=c(1,5,6))
>> f
>>     
> [1] 1 5 6
> Levels: 1 5 6
>   
>> as.numeric(f)
>>     
> [1] 1 2 3
>   
>> as.numeric(levels(f))[f]
>>     
> [1] 1 5 6
>
> Is the documentation wrong, or is the code wrong, or have I missed
> something?
>   

The documentation is somewhat unclear: The last sentence presupposes 
that the factor was generated from numeric data, i.e. the 
factor(c(7,9,13)) syndrome:

 > f <- factor (c(7,9,13))
 > f
[1] 7  9  13
Levels: 7 9 13
 > as.numeric(f)
[1] 1 2 3

Also, the statement that as.numeric(f) is meaningless is a bit strong. 
Probably should say "meaningless without knowledge of the levels and 
their order". And you can actually compare factors with their levels in 
different order:

 > g <- factor (c("7",9,13))
 > g
[1] 7  9  13
Levels: 13 7 9
 > f==g
[1] TRUE TRUE TRUE
 > as.numeric(f)==as.numeric(g)
[1] FALSE FALSE FALSE

Where you need to be careful is that if you do things like
   sexsymbols <- c(16, 19)
   plot(x, y, pch=sexsymbols[sex]),
then you should also do
   legend(x0, y0, legend=levels(sex), pch=sexsymbols)
in order to be sure the symbols match the legend. (Notice that indexing 
with  [sex] implicitly coerces sex to numeric).



More information about the R-help mailing list