[R] factor with numeric names

Saiwing Yeung saiwing at berkeley.edu
Wed Mar 25 13:46:22 CET 2009


Thank you so much both for the answer. I think I have a better handle  
on this now. Yes, Loblolly$Seed is an ordered factor, but I didn't  
realize that the default for ordered factor is contr.poly.

And then I was further confused because I didn't realize the  
coefficient names generated (not just the model) are different  
depending on whether there is an intercept term (even though they were  
both "contr.poly").

 > lm(formula = height ~ age + Seed, data = Loblolly)

Call:
lm(formula = height ~ age + Seed, data = Loblolly)

Coefficients:
(Intercept)          age       Seed.L       Seed.Q       Seed.C        
Seed^4
    -1.31240      2.59052      4.86941      0.87307      0.37894      
-0.46853
      Seed^5       Seed^6       Seed^7       Seed^8       Seed^9       
Seed^10
     0.55237      0.39659     -0.06507      0.35074     -0.83442       
0.42085
     Seed^11      Seed^12      Seed^13
     0.53906     -0.29803     -0.77254

 > lm(formula = height ~ age + Seed - 1, data = Loblolly)

Call:
lm(formula = height ~ age + Seed - 1, data = Loblolly)

Coefficients:
     age  Seed329  Seed327  Seed325  Seed307  Seed331  Seed311   
Seed315  Seed321
  2.5905  -3.3635  -3.0701  -1.7535  -2.3485  -2.6568  -2.0235   
-1.3168  -2.4651
Seed319  Seed301  Seed323  Seed309  Seed303  Seed305
-0.7951  -0.4301  -0.1235   0.1049   0.4299   1.4382


This should have been obvious to me...


(for the sake of completeness) I think factor() doesn't change the  
"ordered-ness"

# as.factor(Loblolly$Seed) doesn't remove the ordered-ness
 > str(Loblolly$Seed)
  Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13  
13 13 13 ...
 > str(as.factor(Loblolly$Seed))
  Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13  
13 13 13 ...

# this works though
 > str(factor(Loblolly$Seed, ordered=F))
  Factor w/ 14 levels "329","327","325",..: 10 10 10 10 10 10 13 13 13  
13 ...


Saiwing



On Mar 21, 2009, at 3:35 PM, John Fox wrote:

> Dear Saiwing Yeung,
>
> You appear to be using orthogonal-polynomial contrasts (generated by
> contr.poly) for Seed, which suggests that Seed is either an ordered  
> factor
> or that you've assigned these contrasts to it. Because Seed has 14  
> levels,
> you end up fitting an degree-13 polynomial. If Seed is indeed an  
> ordered
> factor and you want to use contr.treatment instead then you could,  
> e.g., set
> Loblolly$Seed <- as.factor(Loblolly$Seed). (If I'm right about Seed  
> being an
> ordered factor, your solution worked because it changed Seed to a  
> factor,
> not because it used non-numeric level names.)
>
> I hope this helps,
> John
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>> ]
> On
>> Behalf Of Saiwing Yeung
>> Sent: March-21-09 5:02 PM
>> To: r-help at r-project.org
>> Subject: [R] factor with numeric names
>>
>> Hi all,
>>
>> I have a pretty basic question about categorical variables but I  
>> can't
>> seem to be able to find answer so I am hoping someone here can  
>> help. I
>> found that if the factor names are all in numbers, fitting the model
>> in lm would return labels that are not very recognizable.
>>
>> # Example: let's just assume that we want to fit this model
>> fit <- lm(height ~ age + Seed, data=Loblolly)
>>
>> # See the category names are all mangled up here
>> fit
>>
>>
>> Call:
>> lm(formula = height ~ age + Seed, data = Loblolly)
>>
>> Coefficients:
>> (Intercept)          age       Seed.L       Seed.Q       Seed.C
>> Seed^4
>>    -1.31240      2.59052      4.86941      0.87307      0.37894
>> -0.46853
>>      Seed^5       Seed^6       Seed^7       Seed^8       Seed^9
>> Seed^10
>>     0.55237      0.39659     -0.06507      0.35074     -0.83442
>> 0.42085
>>     Seed^11      Seed^12      Seed^13
>>     0.53906     -0.29803     -0.77254
>>
>>
>>
>> One possible solution I found is to rename the categorical variables
>>
>> seed.str <- paste("S", Loblolly$Seed, sep="")
>> seed.str <- factor(seed.str)
>> fit <- lm(height ~ age + seed.str, data=Loblolly)
>> fit
>>
>>
>>
>> Call:
>> lm(formula = height ~ age + seed.str, data = Loblolly)
>>
>> Coefficients:
>>  (Intercept)           age  seed.strS303  seed.strS305  seed.strS307
>>      -0.4301        2.5905        0.8600        1.8683       -1.9183
>> seed.strS309  seed.strS311  seed.strS315  seed.strS319  seed.strS321
>>       0.5350       -1.5933       -0.8867       -0.3650       -2.0350
>> seed.strS323  seed.strS325  seed.strS327  seed.strS329  seed.strS331
>>       0.3067       -1.3233       -2.6400       -2.9333       -2.2267
>>
>>
>> Now it is actually possible to see which one is which, but is kind of
>> lame. Can someone point me to a more elegant solution? Thank you so
>> much.
>>
>> Saiwing Yeung
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list