[Rd] irrelevant warning message

hadley wickham h.wickham at gmail.com
Tue Jan 13 01:21:47 CET 2009


> PS. Here are two interrelated reasons we don't autoconvert:
>
>  1. Subject id.  Factors give no advantage for a unique id, and some clear
> problems.  In particular when one creates as subset - everyone over 60 say -
> there is no good reason to remember all the ids you didn't select.
>  2. Subject id.  I work on a lot of studies of fractures and fracture risk.  A
> time-trend model might be
>        gam(fracture ~ subject + x1 + x2 + ..., subset=(sex='F'))
>
>  Fracture risk for males and females is so different that separate models are
> the sensible thing.  If subject is a factor before the call, then my model has a
> zillion unneeded levels.  There are other ways out of this issue, but avoiding
> factors is the easiest.

3.  Factors take up more memory than character vectors.

(This is tongue-in-cheek, but in recent versions of R, factor
variables take up (very very slightly) more memory than character
variables. It's a common myth that the opposite is true)

I think R's handling of character vectors has progressed to the point
where they should be the norm, not the exception.  Maybe others will
have different views.

Hadley

-- 
http://had.co.nz/



More information about the R-devel mailing list