[R] boxplot notches

Michael Friendly friendly at yorku.ca
Tue Mar 2 14:48:36 CET 2004


>
>
>>I think John Tukey's idea was that this formula (or just the fact of
>>> using median and quartiles) is still often approximately correct
>>> for quite a few kinds of moderate contaminations...
>>    
>>
>
>It may be approximately correct for the width of a CI (and when I checked 
>it was only appproximately correct for a normal), but I would seriously 
>doubt if it were approximately correct for a significance level of 5%.
>Remember how fast the tails of the asymptotic normal distribution decay: a 
>20% error turns 5% into 2%.
>
>BTW, if there is a precise reference for this it would be good to add it
>to boxplot.stats.Rd, as the confidence limits are unexplained there.
>
>  
>

The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
approximations going from a 95%
confidence interval for a difference in means, to one for a difference 
in medians, using the H-spr=IQR
instead of the standard deviation:

    H-spr/1.349  \approx \sigma in a N(0,1) dist/n
    \sqrt{ \pi / 2} \approx std error of a median
   1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors 
for the standard error of the difference
         between two means, in the cases where one variance is tiny, and 
where both are equal.

I believe this is explained in

@Article{McGill-etal:78,
  author =       "R. McGill and J. W. Tukey and W. Larsen",
  year =         "1978",
  title =        "Variations of Box Plots",
  journal =      TAS,
  volume =       "32",
  pages =        "12--16",
}

-- 
Michael Friendly     Email: friendly at yorku.ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA




More information about the R-help mailing list