[R] Whiskers on the default boxplot {graphics}
David Winsemius
dwinsemius at comcast.net
Thu May 13 16:55:21 CEST 2010
On May 13, 2010, at 10:25 AM, Robert Baer wrote:
>> Hi Peter,
>>
>> You're absolutely correct! The description for 'range' in
>> 'boxplot' help file is a little bit confusing by using the words
>> "interquartile range". I think it should be changed to the "length
>> of the box" to be exact and consistent with those in the help file
>> for "boxplot.stats".
>
> The issue is probably that there are multiple ways (9 to be exact)
> of defining quantiles in R. See 'type= ' arguement for ?quantile.
> The quantile function uses type=7 by default which matches the
> quantile definition used by S-Plus(?), but differs from that used by
> SPSS. Doesn't fivenum essentially use the equivalent of a different
> "type= " arguement (maybe 2 or 5) in constructing the hinges?
>
> It seems perfectly reasonable to talk about 'length of box' (or 'box
> height' depending how you display the boxplot), but aren't the
> hinges simply Q1 and Q3 defined by one of the possible quartile
> definitions (as Peter points out the one used by fivenum)? The box
> height does not necesarily match the distance produced by IQR()
> which also seems to use the equivalent of quantile(..., type=7), but
> it is still an IQR, is it not?
>
> Quantiles apparantly can be defined in more than one "acceptable"
> way (sort of like dealing with ties in rank statistics). The OP
> seemed to want an "exact" explanation of the wiskers, and I think
> Peter has pointed us at the definition of quartiles used by fivenum,
> as opposed to the default used with quantile(..., "type=7").
Yes, and experimentation leads me to the conclusion that the only
possible candidate for matching up the results of fivenum[c(2,4] with
quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove
that to myself from mathematical arguments. since I do not quite
understand the formalism in the quantile page. If the match is not
exact, this would be a tenth definition of IQR.
> set.seed(123)
> y <- rexp(300, .02)
> fivenum(y)
[1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
> for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
25% 75%
15.82506 73.93080
25% 75%
15.87405 74.03625
25% 75%
15.84955 74.08898
25% 75%
15.89854 73.98352
25% 75%
15.86588 74.05383
25% 75%
15.86792 74.04943
--
David.
>
> All that said, I'm not convinced that it is wrong to speak of
> "interquartile range" in 'boxplot' help.
>
> Rob
More information about the R-help
mailing list