[R] stringsAsFactor global option (was "character coerced to a factor")

Terry Therneau therneau at mayo.edu
Mon Apr 23 14:59:38 CEST 2007

--- Gabor Grothendieck <ggrothendieck at gmail.com>

> Just one caveat.  I personally would try to avoid
> using
> global options since it can cause conflicts when
> two different programs assume two different settings
> of the same global option and need to interact.
   I see this argument often, and don't buy it.  In any case, for this
particular option, the Mayo biostatistics group (~120 users) has had 
stringsAsFactors=F as a global default for 15+ years now with no ill effects.
It is much less confusing for both new and old users.

   Johh Kane asked "Any idea what the rationale was for setting the
option to TRUE?"  When factors were first introduced, there was no option
to turn them off.  Reading between the lines of the white book (Statistical
Models in S) that introduced them, this is my guess: they made perfect sense for
the particular data sets that were being analysed by the authors at the time.
Many of the defaults in the survival package, which I wrote, have exactly the
same rationale --- so let us not be too harsh on an author for not forseeing
all the future consequences of a default!

  A place where factors really are a pain is when the patient id is a character
string.  When, for instance, you subset the data to do an analysis of only
the females, having the data set `remember' all of the male id's (the original
levels) is non-productive in dozens of ways.  For other variables factors
work well and have some nice properties.  In general, I've found in my work
(medical research) that factors are beneficial for about 1/5 of the character
variables, a PITA for 1/4, and a wash for the rest; so prefer to do any
transformations myself.

For the historically curious: 
   In Splus, one originally fixed this with an override of the function
   	as.data.frame.character <- as.data.frame.vector
before they added the global option.  In R, unfortunately, this override
didn't work due to namespaces, and we had to wait for the option to be
added.  (Another dammed-if-you-do dammed-if-you-don't issue.  Normally you
don't want users to be able to override a base function, because 9 times out
of 10 they did it by accident and dont' want it either.  But when a user really
does want to do so ...)  

	Terry Therneau

More information about the R-help mailing list