[Rd] stringsAsFactors

Tue Feb 12 00:05:22 CET 2013

Peter,
   I had an earlier response to Duncan that I should have copied to the list.

The subset issue can be fixed.  When the model changes character to factor, it needs to 
remember the levels; just like it does with the factors.  We are simply seeing a reprise 
of problems that occured whem models didn't remember factor levels -- I've been down this 
road before.  Lot's of ideas and work arounds were tried, none of which worked until that 
memory was added ($xlevels in lm, glm, coxph, etc fits).  Can everything be fixed, in the 
sense that R always makes the right choices for my data?   I seriously doubt it.

As to stringsAsFactors -- the right answer is the one that causes each person the least 
bother.  For me that is stringsAsFactors = "some", which means that I turn it off and 
build the ones I need.  The right global default, likely, is whichever one that causes 
members of R Core the least bother :-)

On 02/11/2013 04:46 PM, peter dalgaard wrote:
> It's logically impossible I'd say. If you want to do conversion from character to factor on an as-needed basis, you_will_  have issues with subsetting operations affecting the set of levels.
>
> The logical way out is to define factors before subsetting. As far as possible, create them up front. Doing it automagically in read.table is far from infallible, but at least has some chance of getting in roughly right. In my view, this is actually a pretty strong argument for keeping stringsAsFactors==TRUE.
>
> (