[R] Warning: as.numeric reorders factor data

Bud Gibson fpgibson at umich.edu
Mon Dec 9 00:54:03 CET 2002


The behavior makes more sense now but is in need of clarification in the 
help files.  

Specifically, aggregate should mention that it is converting arguments 
to characters.  Factoring a numeric vector leads to what you might 
expect, factors ordered numerically.  So, even though I knew the by 
variables were being factored, it seemed they should be okay.  For instance,

 > factor(c(1:15))
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

 Ultimately, it occurred to me after much staring at the output that 
factoring must be doing a character conversion first, and that is how I 
figured out my workaround.  As it turns out, the workaround is  in the 
FAQ (albeit filed under Miscellanea).

So, ultimately the problem may not be in as.numeric.  Since one needs to 
know something about the internal processing of aggregate to use it, it 
might be that this should be in the help files.
Thomas Lumley wrote:

>On Sun, 8 Dec 2002, Bud Gibson wrote:
>
>  
>
>>Thanks for the clarification.  It's nice to know that there is some
>>systematicity to the behavior.
>>
>>Is this documented anywhere?  I did look at the help for as.numeric, and
>>it makes no mention that it is coercing factors based on their level.
>>    
>>
>
>Well, the help page for as.numeric says
>
>     `as.numeric' for factors yields the codes underlying the factor
>     levels, not the numeric representation of the labels.
>
>  
>
>> This may be obvious to those deeply immersed in R and its machinations,
>>but to those who think the number they see on the screen should just
>>become a number when it is coerced to one, it is disconcerting.
>>    
>>
>
>Yes it is. It might have been better if at the dawn of time codes() had
>been defined to do what as.numeric does and as.numeric to do what you
>expect.   However, it's not completely obvious: what should as.numeric do
>with a factor of postal codes whose levels are "3163" "90210" and "OX1 3DP"?
>
>
>  
>
>>Further, if I just factor the same vector, and then coerce it back to
>>numeric, the order I would have expected is preserved.  I did not report
>>that test because it seemed irrelevant.  Why isn't aggregate just doing
>>that?
>>    
>>
>
>Because when you have more than one `by' variable in aggegrate it needs to
>make a factor of the combined levels, which it does by pasting them
>together as characters.
>
>  
>
>>My cut is that there should be some warning in the documentation,
>>perhaps in aggregate, about the specific assumptions used in making
>>implicit transformations and what one can expect.
>>    
>>
>
>It might be worth help(aggregate) mentioning that the variables are turned
>into factors.
>
>	-thomas
>  
>




More information about the R-help mailing list