# [R] factor with numeric names

Saiwing Yeung saiwing at berkeley.edu
Wed Mar 25 13:46:22 CET 2009

```Thank you so much both for the answer. I think I have a better handle
on this now. Yes, Loblolly\$Seed is an ordered factor, but I didn't
realize that the default for ordered factor is contr.poly.

And then I was further confused because I didn't realize the
coefficient names generated (not just the model) are different
depending on whether there is an intercept term (even though they were
both "contr.poly").

> lm(formula = height ~ age + Seed, data = Loblolly)

Call:
lm(formula = height ~ age + Seed, data = Loblolly)

Coefficients:
(Intercept)          age       Seed.L       Seed.Q       Seed.C
Seed^4
-1.31240      2.59052      4.86941      0.87307      0.37894
-0.46853
Seed^5       Seed^6       Seed^7       Seed^8       Seed^9
Seed^10
0.55237      0.39659     -0.06507      0.35074     -0.83442
0.42085
Seed^11      Seed^12      Seed^13
0.53906     -0.29803     -0.77254

> lm(formula = height ~ age + Seed - 1, data = Loblolly)

Call:
lm(formula = height ~ age + Seed - 1, data = Loblolly)

Coefficients:
age  Seed329  Seed327  Seed325  Seed307  Seed331  Seed311
Seed315  Seed321
2.5905  -3.3635  -3.0701  -1.7535  -2.3485  -2.6568  -2.0235
-1.3168  -2.4651
Seed319  Seed301  Seed323  Seed309  Seed303  Seed305
-0.7951  -0.4301  -0.1235   0.1049   0.4299   1.4382

This should have been obvious to me...

(for the sake of completeness) I think factor() doesn't change the
"ordered-ness"

# as.factor(Loblolly\$Seed) doesn't remove the ordered-ness
> str(Loblolly\$Seed)
Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13
13 13 13 ...
> str(as.factor(Loblolly\$Seed))
Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13
13 13 13 ...

# this works though
> str(factor(Loblolly\$Seed, ordered=F))
Factor w/ 14 levels "329","327","325",..: 10 10 10 10 10 10 13 13 13
13 ...

Saiwing

On Mar 21, 2009, at 3:35 PM, John Fox wrote:

> Dear Saiwing Yeung,
>
> You appear to be using orthogonal-polynomial contrasts (generated by
> contr.poly) for Seed, which suggests that Seed is either an ordered
> factor
> or that you've assigned these contrasts to it. Because Seed has 14
> levels,
> you end up fitting an degree-13 polynomial. If Seed is indeed an
> ordered
> factor and you want to use contr.treatment instead then you could,
> e.g., set
> Loblolly\$Seed <- as.factor(Loblolly\$Seed). (If I'm right about Seed
> being an
> ordered factor, your solution worked because it changed Seed to a
> factor,
> not because it used non-numeric level names.)
>
> I hope this helps,
> John
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org
>> ]
> On
>> Behalf Of Saiwing Yeung
>> Sent: March-21-09 5:02 PM
>> To: r-help at r-project.org
>> Subject: [R] factor with numeric names
>>
>> Hi all,
>>
>> I have a pretty basic question about categorical variables but I
>> can't
>> seem to be able to find answer so I am hoping someone here can
>> help. I
>> found that if the factor names are all in numbers, fitting the model
>> in lm would return labels that are not very recognizable.
>>
>> # Example: let's just assume that we want to fit this model
>> fit <- lm(height ~ age + Seed, data=Loblolly)
>>
>> # See the category names are all mangled up here
>> fit
>>
>>
>> Call:
>> lm(formula = height ~ age + Seed, data = Loblolly)
>>
>> Coefficients:
>> (Intercept)          age       Seed.L       Seed.Q       Seed.C
>> Seed^4
>>    -1.31240      2.59052      4.86941      0.87307      0.37894
>> -0.46853
>>      Seed^5       Seed^6       Seed^7       Seed^8       Seed^9
>> Seed^10
>>     0.55237      0.39659     -0.06507      0.35074     -0.83442
>> 0.42085
>>     Seed^11      Seed^12      Seed^13
>>     0.53906     -0.29803     -0.77254
>>
>>
>>
>> One possible solution I found is to rename the categorical variables
>>
>> seed.str <- paste("S", Loblolly\$Seed, sep="")
>> seed.str <- factor(seed.str)
>> fit <- lm(height ~ age + seed.str, data=Loblolly)
>> fit
>>
>>
>>
>> Call:
>> lm(formula = height ~ age + seed.str, data = Loblolly)
>>
>> Coefficients:
>>  (Intercept)           age  seed.strS303  seed.strS305  seed.strS307
>>      -0.4301        2.5905        0.8600        1.8683       -1.9183
>> seed.strS309  seed.strS311  seed.strS315  seed.strS319  seed.strS321
>>       0.5350       -1.5933       -0.8867       -0.3650       -2.0350
>> seed.strS323  seed.strS325  seed.strS327  seed.strS329  seed.strS331
>>       0.3067       -1.3233       -2.6400       -2.9333       -2.2267
>>
>>
>> Now it is actually possible to see which one is which, but is kind of
>> lame. Can someone point me to a more elegant solution? Thank you so
>> much.
>>
>> Saiwing Yeung
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help