[R] rbind() of factors in data.frame
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Apr 20 00:08:26 CEST 2007
On Thu, 19 Apr 2007, Albrecht, Dr. Stefan (AZ Private Equity Partner) wrote:
> I would like to inquire, if it is a desired feature that the combination
> with rbind() of two data frames with factors columns does not sort the
> factors levels of the combined data frame.
Yes, and a documented one. To wit, the help file says
Factors have their levels expanded as necessary (in the order of
the levels of the levelsets of the factors encountered) and the
result is an ordered factor if and only if all the components were
ordered factors. (The last point differs from S-PLUS.)
>> str(rbind(data.frame(a = factor(c(4, 3))), data.frame(a = factor(c(2, 1)))))
> 'data.frame': 4 obs. of 1 variable:
> $ a: Factor w/ 4 levels "3","4","1","2": 2 1 4 3
>
> I would expect the combined factor levels to be sorted, as long as both
> factors are not ordered.
I would find that very undesirable: if the order matters at all, it seems
rare that alphabetic (which is highly locale dependent) is optimal. In any
case, if you rbind factors with the same levelset (perhaps the only really
sensible usage), you do not want the result to have a different levelset.
[And why would _you_ expect it to do something other than the help page
says?]
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list