[R] stringsAsFactor global option (was "character coerced to a factor")

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Apr 23 15:48:03 CEST 2007

On Mon, 23 Apr 2007, Terry Therneau wrote:

> --- Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>> Just one caveat.  I personally would try to avoid
>> using
>> global options since it can cause conflicts when
>> two different programs assume two different settings
>> of the same global option and need to interact.
>   I see this argument often, and don't buy it.  In any case, for this
> particular option, the Mayo biostatistics group (~120 users) has had
> stringsAsFactors=F as a global default for 15+ years now with no ill effects.
> It is much less confusing for both new and old users.
>   Johh Kane asked "Any idea what the rationale was for setting the
> option to TRUE?"  When factors were first introduced, there was no option
> to turn them off.  Reading between the lines of the white book (Statistical
> Models in S) that introduced them, this is my guess: they made perfect sense for
> the particular data sets that were being analysed by the authors at the time.
> Many of the defaults in the survival package, which I wrote, have exactly the
> same rationale --- so let us not be too harsh on an author for not forseeing
> all the future consequences of a default!
>  A place where factors really are a pain is when the patient id is a character
> string.  When, for instance, you subset the data to do an analysis of only
> the females, having the data set `remember' all of the male id's (the original
> levels) is non-productive in dozens of ways.  For other variables factors
> work well and have some nice properties.  In general, I've found in my work
> (medical research) that factors are beneficial for about 1/5 of the character
> variables, a PITA for 1/4, and a wash for the rest; so prefer to do any
> transformations myself.
> For the historically curious:
>   In Splus, one originally fixed this with an override of the function
>   	as.data.frame.character <- as.data.frame.vector
> before they added the global option.  In R, unfortunately, this override
> didn't work due to namespaces, and we had to wait for the option to be
> added.  (Another dammed-if-you-do dammed-if-you-don't issue.  Normally you
> don't want users to be able to override a base function, because 9 times out
> of 10 they did it by accident and dont' want it either.  But when a user really
> does want to do so ...)

That is what 'assignInNamespace' is for (and it came in with namespaces).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list