[Rd] Regression stars

Duncan Murdoch murdoch.duncan at gmail.com
Tue Feb 12 20:19:51 CET 2013

On 12/02/2013 1:47 PM, Hervé Pagès wrote:
> On 02/12/2013 08:20 AM, peter dalgaard wrote:
> >
> > On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:
> >
> >>
> >> I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings.
> >
> > I think not. Historically, it's more like "In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case".
> <sarcasm>
> Since character vectors are sooooo bad and people use them where
> they should instead use a factor, I propose to go all the way and
> by adding the stringsAsFactors arg to character() too. That way
> people are put on the right track from the very start.
> </sarcasm>

I think you are misreading what Peter wrote.  He wasn't defending that 
point of view, he was describing it.
> No seriously, if my variable is categorical, it's already in a factor
> and that's how I pass it to data.frame(). But if I have it in a
> character vector, it's because that's how I want it. It's my choice.
> How could anybody ever think that having data.frame() alter his/her
> data is a good thing?
> Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
> You'll do a big favor to your user base.

That's a really bad suggestion -- it would break code for people who set 
stringsAsFactors=FALSE as well as those who rely on the current default 
behaviour.   We certainly won't do that.

Duncan Murdoch

More information about the R-devel mailing list