[Rd] Why is there no c.factor?

Thomas Lumley tlumley at u.washington.edu
Thu Feb 4 18:06:47 CET 2010


On Thu, 4 Feb 2010, Hadley Wickham wrote:

> Hi all,
>
> Is there are reason that there is no c.factor method?  Analogous to
> c.Date, I'd expect something like the following to be useful:
>
> c.factor <- function(...) {
>  factors <- list(...)
>  levels <- unique(unlist(lapply(factors, levels)))
>  char <- unlist(lapply(factors, as.character))
>
>  factor(char, levels = levels)
> }
>
> c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d"))
> # [1] a b c b a d
> # Levels: a b c d
>

It's well established that different people have different views on what factors should do, but this doesn't match mine.   I think of factors as enumerated data types where the factor levels already specify all the valid values for the factor, so I wouldn't want to be able to combine two factors with different sets of levels.

For example:
   A <- factor("orange",levels=c("orange","yellow","red","purple"))
   B <- factor("orange", levels=c("orange","apple","mango", "banananana"))

On the other hand, I think the current behaviour, which reduces them to numbers, is just wrong.


       -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-devel mailing list