[R] Opinion: Why I find factors convenient to use

Rui Barradas ruipbarradas at sapo.pt
Mon Aug 20 14:03:14 CEST 2012


Hello,

Em 20-08-2012 12:30, S Ellison escreveu:
>   
>
>> -----Original Message-----
>> Over the years, many people -- including some who I would
>> consider real expeRts -- have criticized factors and
>> advocated the use (sometimes exclusively) of character
>> vectors instead.
> Exclusive use of character vectors is not going to do the job.
>
> The concept of a factor is fundamental to a lot of statistics; a programming environment that does not implement factors and their associated special behaviour is probably not a statistical programming language.
>
> Special behaviours I have in mind include:
> - Level order can be arbitrarily specified for display purposes
> - A control level can be intentionally chosen for contrasts
> - the option of "ordered" factors (for example, for polr and the like)
>
> So I think the language does and will require a 'factor' type in one form or another.
>
>   _When_ you decide to convert a character input to a factor is, of course, up to the user,and for cleanup it's very often better to stick with character early and convert to factor a bit later. But personally, I think that there is sufficient control over the coding of data to allow user discretion. and on the whole, it seems to me that character input gets used as factor data so much of the time when it is used at all that the default stringsAsFactors=TRUE setting seems the more sensible default.

I disagree with this last point. Just think of the number of questions 
to this list about, say, dates. When read from file using one of the 
forms of read.table, they usually cause problems. Unless the user is an 
experienced one, in which case he/she might not have a question to ask.
Besides, the default TRUE is contradictory with "stick with character 
early and convert to factor a bit later". With both "early" and "later".
A different thing is to have a very used function's default behavior 
change from one version of R to the next one. What about all the code in 
use? Maybe it's better to leave it be.

Rui Barradas
>
> S Ellison
>
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list