[Rd] Regression stars
Brian Lee Yung Rowe
rowe at muxspace.com
Tue Feb 12 17:05:55 CET 2013
I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings.
> fs <- c('apple','peach','watermelon','spinach','persimmon','potato','kale')
> n <- 1000000
> a1 <- data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=TRUE)
> a2 <- data.frame(f=sample(fs,n,replace=TRUE), x1=rnorm(n), x2=rnorm(n), stringsAsFactors=FALSE)
> fn <- function(i,x) x[x$f %in% c('kale','spinach'),]
> system.time(z <- sapply(1:100, fn, a1))
user system elapsed
19.614 4.037 24.649
> system.time(z <- sapply(1:100, fn, a2))
user system elapsed
19.726 7.715 36.761
On Feb 12, 2013, at 10:40 AM, Ben Bolker <bbolker at gmail.com> wrote:
> Thanks, Uwe.
> Now let me go one step farther.
> Can you (or anyone) give a good argument **other than backward
> compatibility** for keeping the stringAsFactors=TRUE argument on
> I appreciate your distinction between data.frame() and read.table()'s
> use of stringAsFactors, and I can see that there is some point for
> quick-and-dirty interactive use in setting all non-numeric variables to
> factors (arguing that wanting non-numerics as factors is somewhat more
> common than wanting them as strings).
> It might be nice to add an optional stringsAsFactors (and check.names)
> argument to transform(): I've had to write my own Transform() function
> to allow the defaults to be overridden, since transform() calls
> data.frame() with the defaults. (Setting the stringsAsFactors option
> globally would work, although not for check.names.)
> Ben BOlker
>>>> What I will likely do is
>>>> make a few changes so that character vectors are automatically changed
>>>> to factors in modelling functions, so that operating with
>>>> stringsAsFactors=FALSE doesn't trigger silly warnings.
>>>> Duncan Murdoch
>>> [apologies for snipping context: "gmane made me do it"]
>>> R-devel at r-project.org mailing list
> R-devel at r-project.org mailing list
More information about the R-devel