[R] boxplot notches
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Mar 1 21:06:50 CET 2004
On Mon, 1 Mar 2004, Martin Maechler wrote:
> >>>>> "TL" == Thomas Lumley <tlumley at u.washington.edu>
> >>>>> on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
>
> TL> On Mon, 1 Mar 2004, Christoph Scherber wrote:
> >> Dear list members,
> >>
> >> Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
> >> calculated? What do these notches represent exactly? I´d suppose they
> >> are Conficence Intervals for the median, but I´ve also been told they
> >> might show Least Significant Difference (LSD) equivalents.
>
> TL> The help page says that
> TL> " If the notches of two plots do not overlap then
> TL> the medians are significantly different at the 5 percent level."
>
> TL> The only thing wrong with this is that it isn't true.
> TL> The code says that the notches are +/- 1.58 IQR/sqrt(n),
> TL> so I think the claimed confidence level holds only for
> TL> normal distribuitons with small amounts of contamination.
>
> I think John Tukey's idea was that this formula (or just the fact of
> using median and quartiles) is still often approximately correct
> for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I checked
it was only appproximately correct for a normal), but I would seriously
doubt if it were approximately correct for a significance level of 5%.
Remember how fast the tails of the asymptotic normal distribution decay: a
20% error turns 5% into 2%.
BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list