[Rd] stringsAsFactors = FALSE
Martin Maechler
maechler at stat.math.ethz.ch
Mon Nov 17 21:47:34 CET 2008
>>>>> "WD" == William Dunlap <wdunlap at tibco.com>
>>>>> on Mon, 17 Nov 2008 09:06:49 -0800 writes:
>> From: r-devel-bounces at r-project.org
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of
>> hadley wickham Sent: Monday, November 17, 2008 5:10 AM
>> To: r-devel at r-project.org Subject: [Rd] stringsAsFactors
>> = FALSE ... The key lines in expand.grid would seem to
>> be
>>
>> if (!is.factor(x) && is.character(x)) x <- factor(x,
>> levels = unique(x))
>>
>> but I'm not sure why they are being converted to factors
>> in the first place.
WD> I think expand.grid converts input strings to factors so
WD> they retain the order they have in the input. (Note
WD> that the levels argument is unique(x), not the
WD> sort(unique(x)) that data.frame uses.) People generally
WD> give expand.grid sorted input and expect it to not alter
WD> the order (the order of the levels affects tables and
WD> and some plots).
>>
WD> lapply(expand.grid(Grade=c("Bad","Good","Better"),Size=c("Small","Medium
WD> ","Large")), levels) $Grade [1] "Bad" "Good" "Better"
WD> $Size [1] "Small" "Medium" "Large"
>>
WD> lapply(data.frame(Grade=c("Bad","Good","Better"),Size=c("Small","Medium"
WD> ,"Large")), levels) $Grade [1] "Bad" "Better" "Good"
WD> $Size [1] "Large" "Medium" "Small"
WD> I have nothing against adding the stringsAsFactors
WD> argument to expand.grid.
That's fine, but I am VERY MUCH against
making the default of that argument depend on the ominous
default.stringsAsFactors()
which is determined by getOption("stringsAsFactors").
Why would I hate such a change very much :
Note that we have here an option which would change the
result of a standard R (S) function expand.grid().
Whereas I already did not like that change when it happened for
read.table(), in that case, one could at least say, that
read.table() is in some way platform dependent
{(because it
typically depends on files of the local platform, but as we
know this is not true even there; even now, if I tell my
students, or a book author tells her readers to use
read.table("http://.....") I can no longer be sure that my
students get the same data frame, because they could have
different settings of getOptions("stringsAsFactors")
.... horrible, really!! )}
Please, R should stay as much a functional language as possible
and sensible!
If we start having global options more and more influence
the result of standard R functions, we are going down a very
slippery rope, and one that is making R even more idionsyncratic
than it already needs to be.
Please, no !!
Rather revert the read.table() default of "stringsAsFactors" to
not depend on the option, and maybe provide another set of short
forms of the various
read.table(*, stringsAsFactors=FALSE)
incantations such that
all the factor-haters-string-lovers can use these short forms...
At the very first DSC, 1999, Joe Eaton, author of GNU octave,
told us how he regretted that he had started going down that bad
path, because users had started asking for it.
In the extreme case, we are ending up with a "language" that
depends on a whole huge status setting, and what a given
function computes can no longer be predicted by looking at the
function calls, unless you simultaneously know that whole status.
Please, No !!
Martin Maechler, ETH Zurich
WD> Bill Dunlap TIBCO Software Inc - Spotfire Division
WD> wdunlap tibco.com
WD> ______________________________________________
WD> R-devel at r-project.org mailing list
WD> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list