[Rd] irrelevant warning message
hadley wickham
h.wickham at gmail.com
Tue Jan 13 01:21:47 CET 2009
> PS. Here are two interrelated reasons we don't autoconvert:
>
> 1. Subject id. Factors give no advantage for a unique id, and some clear
> problems. In particular when one creates as subset - everyone over 60 say -
> there is no good reason to remember all the ids you didn't select.
> 2. Subject id. I work on a lot of studies of fractures and fracture risk. A
> time-trend model might be
> gam(fracture ~ subject + x1 + x2 + ..., subset=(sex='F'))
>
> Fracture risk for males and females is so different that separate models are
> the sensible thing. If subject is a factor before the call, then my model has a
> zillion unneeded levels. There are other ways out of this issue, but avoiding
> factors is the easiest.
3. Factors take up more memory than character vectors.
(This is tongue-in-cheek, but in recent versions of R, factor
variables take up (very very slightly) more memory than character
variables. It's a common myth that the opposite is true)
I think R's handling of character vectors has progressed to the point
where they should be the norm, not the exception. Maybe others will
have different views.
Hadley
--
http://had.co.nz/
More information about the R-devel
mailing list