[R] Whiskers on the default boxplot {graphics}
David Winsemius
dwinsemius at comcast.net
Thu May 13 18:43:41 CEST 2010
On May 13, 2010, at 12:18 PM, Robert Baer wrote:
> And try this (which seems to leave us with type=2) and is listed in ?
> quantile as "Discontinuous sample quantile types 1, 2, and 3"
>> quantile(1:101, c(1,3)/4, type=2)
> 25% 75%
> 26 76
I think Peter may be right,. If I do it with the rnorm function I
repeatedly get the same result for fivenum[2] and the type 2 first
quartile. I did not test those types because they were designed for
discrete values variables, but I suppose everything is really discrete
on computers, eh?
> fivenum(x <- rnorm(101) )
[1] -2.6224338 -0.9682586 -0.1897377 0.5999332 2.5409711
> quantile(x, c(1,3)/4, type=2)
25% 75%
-0.9682586 0.5999332
> fivenum(x <- rnorm(101) )
[1] -3.8251928 -0.6495966 0.1816233 0.7101774 2.3789054
> quantile(x, c(1,3)/4, type=2)
25% 75%
-0.6495966 0.7101774
--
David.
>
>> David,
>>
>> try this:
>>
>> fivenum(1:101)
>> quantile(1:101, c(1,3)/4, type=5)
>>
>> -Peter
>>
>> On 2010-05-13 8:55, David Winsemius wrote:
>>>
>>> On May 13, 2010, at 10:25 AM, Robert Baer wrote:
>>>
>>>>> Hi Peter,
>>>>>
>>>>> You're absolutely correct! The description for 'range' in
>>>>> 'boxplot'
>>>>> help file is a little bit confusing by using the words
>>>>> "interquartile
>>>>> range". I think it should be changed to the "length of the box"
>>>>> to be
>>>>> exact and consistent with those in the help file for
>>>>> "boxplot.stats".
>>>>
>>>> The issue is probably that there are multiple ways (9 to be
>>>> exact) of
>>>> defining quantiles in R. See 'type= ' arguement for ?quantile. The
>>>> quantile function uses type=7 by default which matches the quantile
>>>> definition used by S-Plus(?), but differs from that used by SPSS.
>>>> Doesn't fivenum essentially use the equivalent of a different
>>>> "type= "
>>>> arguement (maybe 2 or 5) in constructing the hinges?
>>>>
>>>> It seems perfectly reasonable to talk about 'length of box' (or
>>>> 'box
>>>> height' depending how you display the boxplot), but aren't the
>>>> hinges
>>>> simply Q1 and Q3 defined by one of the possible quartile
>>>> definitions
>>>> (as Peter points out the one used by fivenum)? The box height
>>>> does not
>>>> necesarily match the distance produced by IQR() which also seems to
>>>> use the equivalent of quantile(..., type=7), but it is still an
>>>> IQR,
>>>> is it not?
>>>>
>>>> Quantiles apparantly can be defined in more than one "acceptable"
>>>> way
>>>> (sort of like dealing with ties in rank statistics). The OP
>>>> seemed to
>>>> want an "exact" explanation of the wiskers, and I think Peter has
>>>> pointed us at the definition of quartiles used by fivenum, as
>>>> opposed
>>>> to the default used with quantile(..., "type=7").
>>>
>>> Yes, and experimentation leads me to the conclusion that the only
>>> possible candidate for matching up the results of fivenum[c(2,4]
>>> with
>>> quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove
>>> that
>>> to myself from mathematical arguments. since I do not quite
>>> understand
>>> the formalism in the quantile page. If the match is not exact, this
>>> would be a tenth definition of IQR.
>>>
>>> > set.seed(123)
>>> > y <- rexp(300, .02)
>>> > fivenum(y)
>>> [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
>>> > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
>>> 25% 75%
>>> 15.82506 73.93080
>>> 25% 75%
>>> 15.87405 74.03625
>>> 25% 75%
>>> 15.84955 74.08898
>>> 25% 75%
>>> 15.89854 73.98352
>>> 25% 75%
>>> 15.86588 74.05383
>>> 25% 75%
>>> 15.86792 74.04943
>>>
>>
>> --
>> Peter Ehlers
>> University of Calgary
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list