[R] boxplot notches

Christoph Scherber Christoph.Scherber at uni-jena.de
Tue Mar 2 15:51:06 CET 2004


In McGill et al. (1978) there´s a description of the calculation as 
follows (p. 16):

"The widths [are] computed from the midspread or interquartile range (R) 
of the data (...), and the number of observations (N) for each group. 
The Gaussian-based asymptotic approximation (Kendall and Stuart 1967) of 
the standard deviation s of the median (M) is given by

s=1.25 R/1.35 sqrt(N)

and can be shown to be reasonably broadly applicable to other 
distributions (...)

The notch around each median can then be calculated as

M +- Cs,

where C is a constant. Should one desire a notch indicating 95 percent 
confidence interval about each median, C = 1.96 would be used (...)

It can be shown that C=1.96 would only be appropriate if the standard 
deviations of the two groups were vastly different (...) Thus, the 
notches were computed as

M+-1.7(1.25R/1.35 sqrt(N))

Hope this helps. Best regards
Chris.

REF:
McGill, R; Tukey, JW &  Larsen, WA (1978) Variations of Box Plots. The 
American Statistician, Vol.32 No. 1, pp.12-16.
Kendall, MG & Stuart, A (1967): The Advanced Theory of Statistics, 
Vol.1, 2nd ed., Ch14., New York, Hafner Publishing Co.

*****************************************


Michael Friendly wrote:

>>
>>
>>> I think John Tukey's idea was that this formula (or just the fact of
>>>
>>>> using median and quartiles) is still often approximately correct
>>>> for quite a few kinds of moderate contaminations...
>>>
>>>   
>>
>>
>> It may be approximately correct for the width of a CI (and when I 
>> checked it was only appproximately correct for a normal), but I would 
>> seriously doubt if it were approximately correct for a significance 
>> level of 5%.
>> Remember how fast the tails of the asymptotic normal distribution 
>> decay: a 20% error turns 5% into 2%.
>>
>> BTW, if there is a precise reference for this it would be good to add it
>> to boxplot.stats.Rd, as the confidence limits are unexplained there.
>>
>>  
>>
>
> The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
> approximations going from a 95%
> confidence interval for a difference in means, to one for a difference 
> in medians, using the H-spr=IQR
> instead of the standard deviation:
>
>    H-spr/1.349  \approx \sigma in a N(0,1) dist/n
>    \sqrt{ \pi / 2} \approx std error of a median
>   1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, 
> factors for the standard error of the difference
>         between two means, in the cases where one variance is tiny, 
> and where both are equal.
>
> I believe this is explained in
>
> @Article{McGill-etal:78,
>  author =       "R. McGill and J. W. Tukey and W. Larsen",
>  year =         "1978",
>  title =        "Variations of Box Plots",
>  journal =      TAS,
>  volume =       "32",
>  pages =        "12--16",
> }
>




More information about the R-help mailing list