[Rd] stringsAsFactors

Terry Therneau therneau at mayo.edu
Mon Feb 11 14:50:02 CET 2013

I think your idea to remove the warnings is excellent, and a good compromise.  Characters 
already work fine in modeling functions except for the silly warning.

It is interesting how often the defaults for a program reflect the data sets in use at the 
time the defaults were chosen.  There are some such in my own survival package whose 
proper value is no longer as "obvious" as it was when I chose them.  Factors are very 
handy for variables which have only a few levels and will be used in modeling.  Every 
character variable of every dataset in "Statistical Models in S", which introduced 
factors, is of this type so auto-transformation made a lot of sense.  The "solder" data 
set there is one for which Helmert contrasts are proper so guess what the default contrast 
option was?  (I think there are only a few data sets in the world for which Helmert makes 
sense, however, and R eventually changed the default.)

For character variables that should not be factors such as a street adress 
stringsAsFactors can be a real PITA, and I expect that people's preference for the option 
depends almost entirely on how often these arise in their own work.  As long as there is 
an option that can be overridden I'm okay.  Yes, I'd prefer FALSE as the default, partly 
because the current value is a tripwire in the hallway that eventually catches every new user.

Terry Therneau

On 02/11/2013 05:00 AM, r-devel-request at r-project.org wrote:
> Both of these were discussed by R Core.  I think it's unlikely the
> default for stringsAsFactors will be changed (some R Core members like
> the current behaviour), but it's fairly likely the show.signif.stars
> default will change.  (That's if someone gets around to it:  I
> personally don't care about that one.  P-values are commonly used
> statistics, and the stars are just a simple graphical display of them.
> I find some p-values to be useful, and the display to be harmless.)
> I think it's really unlikely the more extreme changes (i.e. dropping
> show.signif.stars completely, or dropping p-values) will happen.
> Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
> I'll let the people who like it defend it.  What I will likely do is
> make a few changes so that character vectors are automatically changed
> to factors in modelling functions, so that operating with
> stringsAsFactors=FALSE doesn't trigger silly warnings.

More information about the R-devel mailing list