[Rd] Unexpected behavior in factor level ordering

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 25 19:25:49 CET 2012



On 25.02.2012 19:16, Paul Johnson wrote:
> Hello, Everybody:
>
> This may not be a "bug", but for me it is an unexpected outcome. A
> factor variable's levels
> do not retain their ordering after the levels function is used.  I
> supply an example in which
> a factor with values "BC" "AD" (in that order) is unintentionally
> re-alphabetized by the levels
> function.
>
> To me, this is very bad behavior. Would you agree?
>
>
> # Paul Johnson 2012-02-05
>
> x<- c("AD","BC","AD","BC","AD","BC")
> xf<- factor(x, levels=c("BC", "AD"), labels=c("Before Christ","After Christ"))
> y<- rnorm(6)
>
> m1<- lm (y ~ xf )
>
> plot(y ~ xf)
>
> abline (m1)
> ## Just a little problem the line does not "go through" the box
> ## plot in the right spot because contrasts(xf) is 0,1 but
> ## the plot uses xf in 1,2.
>
> xlevels<- levels(xf)
> newdf<- data.frame(xf=xlevels)
>
> ypred<- predict(m1, newdata=newdf)
>
> ##Watch now: the plot comes out "reversed", AC before BC
> plot(ypred ~ newdf$xf)
>
> ## Ah. Now I see:
>
> levels(newdf$xf)
> ## Why doesnt newdf$xf respect the ordering of the levels?


Because xlevels was a character and you coerced it to a factor by 
calling data.frame(xf=xlevels) on it without telling anything about the 
orderiung, hence it got sorted lexicographically.

Uwe Ligges


>
>
>



More information about the R-devel mailing list