[R] rbind() of factors in data.frame

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Apr 20 00:08:26 CEST 2007


On Thu, 19 Apr 2007, Albrecht, Dr. Stefan (AZ Private Equity Partner) wrote:

> I would like to inquire, if it is a desired feature that the combination 
> with rbind() of two data frames with factors columns does not sort the 
> factors levels of the combined data frame.

Yes, and a documented one. To wit, the help file says

      Factors have their levels expanded as necessary (in the order of
      the levels of the levelsets of the factors encountered) and the
      result is an ordered factor if and only if all the components were
      ordered factors.  (The last point differs from S-PLUS.)

>> str(rbind(data.frame(a = factor(c(4, 3))), data.frame(a = factor(c(2, 1)))))
> 'data.frame':   4 obs. of  1 variable:
> $ a: Factor w/ 4 levels "3","4","1","2": 2 1 4 3
>
> I would expect the combined factor levels to be sorted, as long as both 
> factors are not ordered.

I would find that very undesirable: if the order matters at all, it seems 
rare that alphabetic (which is highly locale dependent) is optimal. In any 
case, if you rbind factors with the same levelset (perhaps the only really 
sensible usage), you do not want the result to have a different levelset.

[And why would _you_ expect it to do something other than the help page 
says?]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list