[R] Ordering categories on a boxplot - a serious trap??

Schwab,Wilhelm K bschwab at anest.ufl.edu
Fri Feb 26 02:40:30 CET 2010


Phil,

That works[*], but I still think there is a big problem given how easy it is to do the wrong thing, and that searches lead to dangerous instructions.  Hopefully this will serve to keep others out of trouble, but so might an immutable return value from levels().

[*] I have not yet done anything with selecting parts of the data frame.  Using a separate factor, I quickly hit trouble with size mismatches, though I could probably work around them by recreating the factor after any such change.  Proceeding with caution...

Bill

---
Wilhelm K. Schwab, Ph.D.



-----Original Message-----
From: Phil Spector [mailto:spector at stat.berkeley.edu]
Sent: Thursday, February 25, 2010 7:06 PM
To: Schwab,Wilhelm K
Subject: Re: [R] Ordering categories on a boxplot - a serious trap??

Wilhelm -
    I don't know if this is correct for your problem because you didn't provide a reproducible example, but perhaps you could try

v$doseGroup = factor(v$doseGroup,levels=c("L", "M", "H"))

instead of setting the levels directly.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu

On Thu, 25 Feb 2010, Schwab,Wilhelm K wrote:

> Hello all,
>
> I think I probably did something stupid, and R's part was to allow me to do it.  My goal was to control the order of factor levels appearing horizontally on a boxplot.  Enter search engines and perhaps some creative stupidity on my part, and I came up with the following:
>
> 	v=read.table("factor-order.txt",header=TRUE);
> 	levels(v$doseGroup) = c("L", "M", "H");
> 	boxplot(v$dose~v$doseGroup);
>
>
> A good way to see the trap is to evaluate:
>
> 	v=read.table("factor-order.txt",header=TRUE);
> 	par(mfrow=c(2,1));
> 	boxplot(v$dose~v$doseGroup);
> 	levels(v$doseGroup) = c("L", "M", "H");
> 	boxplot(v$dose~v$doseGroup);
> 	par(mfrow=c(1,1));
>
> The above creates two plots, one correct with the factors in an inconvient order, and one that is WRONG.  In the latter, the labels appear in the desired order, but the data does not "move with them."  I did not discover the problem until I repeated the same type of plot with something that had a known relationship with the levels, and the result was clearly not correct.
>
> I *think* the problem is to assign to the return value of levels().  
> How did I think to do that?  I'm not really sure, but please look at
>
>  https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html
>
>
> Perhaps it does not say to do exactly what I did, but it sure was easy to follow to the mistake, it appeared to do what I wanted, and the consequences of the mistake are ugly.  Perhaps levels() should return something that is immutable??  If I am looking at this correctly, levels() is an accident waiting to happen.
>
> What should I have done?  It seems:
>
> 	read data and order factor levels
> 	v=read.table("factor-order.txt",header=TRUE);
> 	group = factor(v$doseGroup,levels = c("L", "M", "H") );
> 	boxplot(v$dose~group);
>
>
> One disappointment is that the above factor() call apparently needs to be repeated for any subset of v - I'm still trying to get my mind around that one.
>
> Can anyone confirm this?  It strikes me as a trap that should be addressed so that an error results rather than a garbage graph.
>
> Bill
>
>
> ---
> Wilhelm K. Schwab, Ph.D.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list