[R] Ordering categories on a boxplot - a serious trap??

Schwab,Wilhelm K bschwab at anest.ufl.edu
Fri Feb 26 00:51:11 CET 2010


Hello all,

I think I probably did something stupid, and R's part was to allow me to do it.  My goal was to control the order of factor levels appearing horizontally on a boxplot.  Enter search engines and perhaps some creative stupidity on my part, and I came up with the following:

	v=read.table("factor-order.txt",header=TRUE);
	levels(v$doseGroup) = c("L", "M", "H");
	boxplot(v$dose~v$doseGroup);


A good way to see the trap is to evaluate:

	v=read.table("factor-order.txt",header=TRUE);
	par(mfrow=c(2,1));
	boxplot(v$dose~v$doseGroup);
	levels(v$doseGroup) = c("L", "M", "H");
	boxplot(v$dose~v$doseGroup);
	par(mfrow=c(1,1));

The above creates two plots, one correct with the factors in an inconvient order, and one that is WRONG.  In the latter, the labels appear in the desired order, but the data does not "move with them."  I did not discover the problem until I repeated the same type of plot with something that had a known relationship with the levels, and the result was clearly not correct.

I *think* the problem is to assign to the return value of levels().  How did I think to do that?  I'm not really sure, but please look at

  https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html


Perhaps it does not say to do exactly what I did, but it sure was easy to follow to the mistake, it appeared to do what I wanted, and the consequences of the mistake are ugly.  Perhaps levels() should return something that is immutable??  If I am looking at this correctly, levels() is an accident waiting to happen.

What should I have done?  It seems:

	read data and order factor levels
	v=read.table("factor-order.txt",header=TRUE);
	group = factor(v$doseGroup,levels = c("L", "M", "H") );
	boxplot(v$dose~group);


One disappointment is that the above factor() call apparently needs to be repeated for any subset of v - I'm still trying to get my mind around that one.

Can anyone confirm this?  It strikes me as a trap that should be addressed so that an error results rather than a garbage graph.

Bill


---
Wilhelm K. Schwab, Ph.D.



More information about the R-help mailing list