[R] Warning: as.numeric reorders factor data

Thomas Lumley tlumley at u.washington.edu
Mon Dec 9 00:25:03 CET 2002


On Sun, 8 Dec 2002, Bud Gibson wrote:

> Thanks for the clarification.  It's nice to know that there is some
> systematicity to the behavior.
>
> Is this documented anywhere?  I did look at the help for as.numeric, and
> it makes no mention that it is coercing factors based on their level.

Well, the help page for as.numeric says

     `as.numeric' for factors yields the codes underlying the factor
     levels, not the numeric representation of the labels.

>  This may be obvious to those deeply immersed in R and its machinations,
> but to those who think the number they see on the screen should just
> become a number when it is coerced to one, it is disconcerting.

Yes it is. It might have been better if at the dawn of time codes() had
been defined to do what as.numeric does and as.numeric to do what you
expect.   However, it's not completely obvious: what should as.numeric do
with a factor of postal codes whose levels are "3163" "90210" and "OX1 3DP"?


> Further, if I just factor the same vector, and then coerce it back to
> numeric, the order I would have expected is preserved.  I did not report
> that test because it seemed irrelevant.  Why isn't aggregate just doing
> that?

Because when you have more than one `by' variable in aggegrate it needs to
make a factor of the combined levels, which it does by pasting them
together as characters.

> My cut is that there should be some warning in the documentation,
> perhaps in aggregate, about the specific assumptions used in making
> implicit transformations and what one can expect.

It might be worth help(aggregate) mentioning that the variables are turned
into factors.

	-thomas




More information about the R-help mailing list