[Rd] stringsAsFactors = FALSE

Martin Maechler maechler at stat.math.ethz.ch
Mon Nov 17 21:47:34 CET 2008


>>>>> "WD" == William Dunlap <wdunlap at tibco.com>
>>>>>     on Mon, 17 Nov 2008 09:06:49 -0800 writes:

    >> From: r-devel-bounces at r-project.org
    >> [mailto:r-devel-bounces at r-project.org] On Behalf Of
    >> hadley wickham Sent: Monday, November 17, 2008 5:10 AM
    >> To: r-devel at r-project.org Subject: [Rd] stringsAsFactors
    >> = FALSE ...  The key lines in expand.grid would seem to
    >> be
    >> 
    >> if (!is.factor(x) && is.character(x)) x <- factor(x,
    >> levels = unique(x))
    >> 
    >> but I'm not sure why they are being converted to factors
    >> in the first place.

    WD> I think expand.grid converts input strings to factors so
    WD> they retain the order they have in the input.  (Note
    WD> that the levels argument is unique(x), not the
    WD> sort(unique(x)) that data.frame uses.)  People generally
    WD> give expand.grid sorted input and expect it to not alter
    WD> the order (the order of the levels affects tables and
    WD> and some plots).

    >> 
    WD> lapply(expand.grid(Grade=c("Bad","Good","Better"),Size=c("Small","Medium
    WD> ","Large")), levels) $Grade [1] "Bad" "Good" "Better"

    WD> $Size [1] "Small" "Medium" "Large"

    >> 
    WD> lapply(data.frame(Grade=c("Bad","Good","Better"),Size=c("Small","Medium"
    WD> ,"Large")), levels) $Grade [1] "Bad" "Better" "Good"

    WD> $Size [1] "Large" "Medium" "Small"


    WD> I have nothing against adding the stringsAsFactors
    WD> argument to expand.grid.

That's fine, but I am VERY MUCH against 
making the default of that argument depend on the ominous
  default.stringsAsFactors()
which is determined by getOption("stringsAsFactors").

Why would I hate such a change very much : 
 Note that we have here an option which would change the
 result of a standard R (S) function  expand.grid().

Whereas I already did not like that change when it happened for
read.table(), in that case, one could at least say, that
read.table() is in some way platform dependent 
{(because it
  typically depends on files of the local platform, but as we
  know this is not true even there; even now, if I tell my
  students, or a book author tells her readers to use
  read.table("http://.....")  I can no longer be sure that my
  students get the same data frame, because they could have
  different settings of getOptions("stringsAsFactors")
  .... horrible, really!! )}

Please, R should stay as much a functional language as possible
and sensible!
If we start having global options more and more influence
the result of standard R functions, we are going down a very
slippery rope, and one that is making R even more idionsyncratic
than it already needs to be. 
Please, no !!  
Rather revert the read.table() default of "stringsAsFactors" to
not depend on the option, and maybe provide another set of short
forms of the various
       read.table(*, stringsAsFactors=FALSE)
incantations such that
all the factor-haters-string-lovers can use these short forms...

At the very first DSC, 1999, Joe Eaton, author of GNU octave,
told us how he regretted that he had started going down that bad
path, because users had started asking for it.
In the extreme case, we are ending up with a "language" that
depends on a whole huge status setting, and what a given
function computes can no longer be predicted by looking at the
function calls, unless you simultaneously know that whole status.
Please, No !!

Martin Maechler, ETH Zurich


    WD> Bill Dunlap TIBCO Software Inc - Spotfire Division
    WD> wdunlap tibco.com

    WD> ______________________________________________
    WD> R-devel at r-project.org mailing list
    WD> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list