[R] Whiskers on the default boxplot {graphics}

David Winsemius dwinsemius at comcast.net
Thu May 13 16:55:21 CEST 2010


On May 13, 2010, at 10:25 AM, Robert Baer wrote:

>> Hi Peter,
>>
>> You're absolutely correct!  The description for 'range' in  
>> 'boxplot' help file is a little bit confusing by using the words  
>> "interquartile range". I think it should be changed to the "length  
>> of the box" to be exact and consistent with those in the help file  
>> for "boxplot.stats".
>
> The issue is probably that there are multiple ways (9 to be exact)  
> of defining quantiles in R.  See 'type= ' arguement for ?quantile.   
> The quantile function uses type=7 by default which matches the  
> quantile definition used by S-Plus(?), but differs from that used by  
> SPSS.  Doesn't fivenum essentially use the equivalent of a different  
> "type= " arguement (maybe 2 or 5) in constructing the hinges?
>
> It seems perfectly reasonable to talk about 'length of box' (or 'box  
> height' depending how you display the boxplot), but aren't the  
> hinges simply Q1 and Q3 defined by one of the possible quartile  
> definitions (as Peter points out the one used by fivenum)?  The box  
> height does not necesarily match the distance produced by IQR()  
> which also seems to use the equivalent of quantile(..., type=7), but  
> it is still an IQR, is it not?
>
> Quantiles apparantly can be defined in more than one "acceptable"  
> way (sort of like dealing with ties in rank statistics).  The OP  
> seemed to want an "exact" explanation of the wiskers, and I think  
> Peter has pointed us at the definition of quartiles used by fivenum,  
> as opposed to the default  used with quantile(..., "type=7").

Yes, and experimentation leads me to the conclusion that the only  
possible candidate for matching up the results of fivenum[c(2,4]  with  
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
that to myself from mathematical arguments. since I do not quite  
understand the formalism in the quantile page. If the match is not  
exact, this would be a tenth definition of IQR.

 > set.seed(123)
 >  y <- rexp(300, .02)
 > fivenum(y)
[1]   0.2183685  15.8740466  42.1147820  74.0362517 360.5503788
 > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
      25%      75%
15.82506 73.93080
      25%      75%
15.87405 74.03625
      25%      75%
15.84955 74.08898
      25%      75%
15.89854 73.98352
      25%      75%
15.86588 74.05383
      25%      75%
15.86792 74.04943

-- 
David.

>
> All that said, I'm not convinced that it is wrong to speak of  
> "interquartile range" in 'boxplot' help.
>
> Rob



More information about the R-help mailing list