[R] Whiskers on the default boxplot {graphics}

David Winsemius dwinsemius at comcast.net
Thu May 13 18:43:41 CEST 2010


On May 13, 2010, at 12:18 PM, Robert Baer wrote:

> And try this (which seems to leave us with type=2) and is listed in ? 
> quantile as "Discontinuous sample quantile types 1, 2, and 3"
>> quantile(1:101, c(1,3)/4, type=2)
> 25% 75%
> 26  76

I think Peter may be right,. If I do it with the rnorm function I  
repeatedly get the same result for fivenum[2] and the type 2 first  
quartile. I did not test those types because they were designed for  
discrete values variables, but I suppose everything is really discrete  
on computers, eh?

 > fivenum(x <- rnorm(101) )
[1] -2.6224338 -0.9682586 -0.1897377  0.5999332  2.5409711
 > quantile(x, c(1,3)/4, type=2)
        25%        75%
-0.9682586  0.5999332

 > fivenum(x <- rnorm(101) )
[1] -3.8251928 -0.6495966  0.1816233  0.7101774  2.3789054
 > quantile(x, c(1,3)/4, type=2)
        25%        75%
-0.6495966  0.7101774

-- 
David.
>
>> David,
>>
>> try this:
>>
>> fivenum(1:101)
>> quantile(1:101, c(1,3)/4, type=5)
>>
>> -Peter
>>
>> On 2010-05-13 8:55, David Winsemius wrote:
>>>
>>> On May 13, 2010, at 10:25 AM, Robert Baer wrote:
>>>
>>>>> Hi Peter,
>>>>>
>>>>> You're absolutely correct! The description for 'range' in  
>>>>> 'boxplot'
>>>>> help file is a little bit confusing by using the words  
>>>>> "interquartile
>>>>> range". I think it should be changed to the "length of the box"  
>>>>> to be
>>>>> exact and consistent with those in the help file for  
>>>>> "boxplot.stats".
>>>>
>>>> The issue is probably that there are multiple ways (9 to be  
>>>> exact) of
>>>> defining quantiles in R. See 'type= ' arguement for ?quantile. The
>>>> quantile function uses type=7 by default which matches the quantile
>>>> definition used by S-Plus(?), but differs from that used by SPSS.
>>>> Doesn't fivenum essentially use the equivalent of a different  
>>>> "type= "
>>>> arguement (maybe 2 or 5) in constructing the hinges?
>>>>
>>>> It seems perfectly reasonable to talk about 'length of box' (or  
>>>> 'box
>>>> height' depending how you display the boxplot), but aren't the  
>>>> hinges
>>>> simply Q1 and Q3 defined by one of the possible quartile  
>>>> definitions
>>>> (as Peter points out the one used by fivenum)? The box height  
>>>> does not
>>>> necesarily match the distance produced by IQR() which also seems to
>>>> use the equivalent of quantile(..., type=7), but it is still an  
>>>> IQR,
>>>> is it not?
>>>>
>>>> Quantiles apparantly can be defined in more than one "acceptable"  
>>>> way
>>>> (sort of like dealing with ties in rank statistics). The OP  
>>>> seemed to
>>>> want an "exact" explanation of the wiskers, and I think Peter has
>>>> pointed us at the definition of quartiles used by fivenum, as  
>>>> opposed
>>>> to the default used with quantile(..., "type=7").
>>>
>>> Yes, and experimentation leads me to the conclusion that the only
>>> possible candidate for matching up the results of fivenum[c(2,4]  
>>> with
>>> quantile(y, c(1,3)/4, type=i) is for type=5. I'm not able to prove  
>>> that
>>> to myself from mathematical arguments. since I do not quite  
>>> understand
>>> the formalism in the quantile page. If the match is not exact, this
>>> would be a tenth definition of IQR.
>>>
>>> > set.seed(123)
>>> > y <- rexp(300, .02)
>>> > fivenum(y)
>>> [1] 0.2183685 15.8740466 42.1147820 74.0362517 360.5503788
>>> > for (i in 4:9) {print(quantile(y, c(1,3)/4, type=i) ) }
>>> 25% 75%
>>> 15.82506 73.93080
>>> 25% 75%
>>> 15.87405 74.03625
>>> 25% 75%
>>> 15.84955 74.08898
>>> 25% 75%
>>> 15.89854 73.98352
>>> 25% 75%
>>> 15.86588 74.05383
>>> 25% 75%
>>> 15.86792 74.04943
>>>
>>
>> -- 
>> Peter Ehlers
>> University of Calgary
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list