therneau at mayo.edu
Tue Feb 12 00:05:22 CET 2013
I had an earlier response to Duncan that I should have copied to the list.
The subset issue can be fixed. When the model changes character to factor, it needs to
remember the levels; just like it does with the factors. We are simply seeing a reprise
of problems that occured whem models didn't remember factor levels -- I've been down this
road before. Lot's of ideas and work arounds were tried, none of which worked until that
memory was added ($xlevels in lm, glm, coxph, etc fits). Can everything be fixed, in the
sense that R always makes the right choices for my data? I seriously doubt it.
As to stringsAsFactors -- the right answer is the one that causes each person the least
bother. For me that is stringsAsFactors = "some", which means that I turn it off and
build the ones I need. The right global default, likely, is whichever one that causes
members of R Core the least bother :-)
On 02/11/2013 04:46 PM, peter dalgaard wrote:
> It's logically impossible I'd say. If you want to do conversion from character to factor on an as-needed basis, you_will_ have issues with subsetting operations affecting the set of levels.
> The logical way out is to define factors before subsetting. As far as possible, create them up front. Doing it automagically in read.table is far from infallible, but at least has some chance of getting in roughly right. In my view, this is actually a pretty strong argument for keeping stringsAsFactors==TRUE.
More information about the R-devel