[R] factor with numeric names

Saiwing Yeung saiwing at berkeley.edu
Sat Mar 21 22:02:13 CET 2009


Hi all,

I have a pretty basic question about categorical variables but I can't  
seem to be able to find answer so I am hoping someone here can help. I  
found that if the factor names are all in numbers, fitting the model  
in lm would return labels that are not very recognizable.

# Example: let's just assume that we want to fit this model
fit <- lm(height ~ age + Seed, data=Loblolly)

# See the category names are all mangled up here
fit


Call:
lm(formula = height ~ age + Seed, data = Loblolly)

Coefficients:
(Intercept)          age       Seed.L       Seed.Q       Seed.C        
Seed^4
    -1.31240      2.59052      4.86941      0.87307      0.37894      
-0.46853
      Seed^5       Seed^6       Seed^7       Seed^8       Seed^9       
Seed^10
     0.55237      0.39659     -0.06507      0.35074     -0.83442       
0.42085
     Seed^11      Seed^12      Seed^13
     0.53906     -0.29803     -0.77254



One possible solution I found is to rename the categorical variables

seed.str <- paste("S", Loblolly$Seed, sep="")
seed.str <- factor(seed.str)
fit <- lm(height ~ age + seed.str, data=Loblolly)
fit



Call:
lm(formula = height ~ age + seed.str, data = Loblolly)

Coefficients:
  (Intercept)           age  seed.strS303  seed.strS305  seed.strS307
      -0.4301        2.5905        0.8600        1.8683       -1.9183
seed.strS309  seed.strS311  seed.strS315  seed.strS319  seed.strS321
       0.5350       -1.5933       -0.8867       -0.3650       -2.0350
seed.strS323  seed.strS325  seed.strS327  seed.strS329  seed.strS331
       0.3067       -1.3233       -2.6400       -2.9333       -2.2267


Now it is actually possible to see which one is which, but is kind of  
lame. Can someone point me to a more elegant solution? Thank you so  
much.

Saiwing Yeung




More information about the R-help mailing list