[R] Ordering categories on a boxplot - a serious trap??

Ista Zahn istazahn at gmail.com
Fri Feb 26 04:14:56 CET 2010


Hi Wilhelm,
I agree it's confusing to have a levels() function that does something
so different from the levels argument of the factor function.
Personally I use levels() only as an extractor, never to change
levels. For that I use factor(), with the levels and labels arguments
as needed.

Best,
Ista

On Thu, Feb 25, 2010 at 8:40 PM, Schwab,Wilhelm K <bschwab at anest.ufl.edu> wrote:
> Phil,
>
> That works[*], but I still think there is a big problem given how easy it is to do the wrong thing, and that searches lead to dangerous instructions.  Hopefully this will serve to keep others out of trouble, but so might an immutable return value from levels().
>
> [*] I have not yet done anything with selecting parts of the data frame.  Using a separate factor, I quickly hit trouble with size mismatches, though I could probably work around them by recreating the factor after any such change.  Proceeding with caution...
>
> Bill
>
> ---
> Wilhelm K. Schwab, Ph.D.
>
>
>
> -----Original Message-----
> From: Phil Spector [mailto:spector at stat.berkeley.edu]
> Sent: Thursday, February 25, 2010 7:06 PM
> To: Schwab,Wilhelm K
> Subject: Re: [R] Ordering categories on a boxplot - a serious trap??
>
> Wilhelm -
>    I don't know if this is correct for your problem because you didn't provide a reproducible example, but perhaps you could try
>
> v$doseGroup = factor(v$doseGroup,levels=c("L", "M", "H"))
>
> instead of setting the levels directly.
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spector at stat.berkeley.edu
>
> On Thu, 25 Feb 2010, Schwab,Wilhelm K wrote:
>
>> Hello all,
>>
>> I think I probably did something stupid, and R's part was to allow me to do it.  My goal was to control the order of factor levels appearing horizontally on a boxplot.  Enter search engines and perhaps some creative stupidity on my part, and I came up with the following:
>>
>>       v=read.table("factor-order.txt",header=TRUE);
>>       levels(v$doseGroup) = c("L", "M", "H");
>>       boxplot(v$dose~v$doseGroup);
>>
>>
>> A good way to see the trap is to evaluate:
>>
>>       v=read.table("factor-order.txt",header=TRUE);
>>       par(mfrow=c(2,1));
>>       boxplot(v$dose~v$doseGroup);
>>       levels(v$doseGroup) = c("L", "M", "H");
>>       boxplot(v$dose~v$doseGroup);
>>       par(mfrow=c(1,1));
>>
>> The above creates two plots, one correct with the factors in an inconvient order, and one that is WRONG.  In the latter, the labels appear in the desired order, but the data does not "move with them."  I did not discover the problem until I repeated the same type of plot with something that had a known relationship with the levels, and the result was clearly not correct.
>>
>> I *think* the problem is to assign to the return value of levels().
>> How did I think to do that?  I'm not really sure, but please look at
>>
>>  https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html
>>
>>
>> Perhaps it does not say to do exactly what I did, but it sure was easy to follow to the mistake, it appeared to do what I wanted, and the consequences of the mistake are ugly.  Perhaps levels() should return something that is immutable??  If I am looking at this correctly, levels() is an accident waiting to happen.
>>
>> What should I have done?  It seems:
>>
>>       read data and order factor levels
>>       v=read.table("factor-order.txt",header=TRUE);
>>       group = factor(v$doseGroup,levels = c("L", "M", "H") );
>>       boxplot(v$dose~group);
>>
>>
>> One disappointment is that the above factor() call apparently needs to be repeated for any subset of v - I'm still trying to get my mind around that one.
>>
>> Can anyone confirm this?  It strikes me as a trap that should be addressed so that an error results rather than a garbage graph.
>>
>> Bill
>>
>>
>> ---
>> Wilhelm K. Schwab, Ph.D.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list