[R] boxplot notches

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Mar 2 13:23:46 CET 2004


On Mon, 1 Mar 2004, David James wrote:

> Prof Brian Ripley wrote:
> > On Mon, 1 Mar 2004, Martin Maechler wrote:
> > 
> > > >>>>> "TL" == Thomas Lumley <tlumley at u.washington.edu>
> > > >>>>>     on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
> > > 
> > >     TL> On Mon, 1 Mar 2004, Christoph Scherber wrote:
> > >     >> Dear list members,
> > >     >> 
> > >     >> Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
> > >     >> calculated? What do these notches represent exactly? I´d suppose they
> > >     >> are Conficence Intervals for the median, but I´ve also been told they
> > >     >> might show Least Significant Difference (LSD) equivalents.
> > > 
> > >     TL> The help page says that 
> > >     TL> " If the notches of two plots do not overlap then
> > >     TL>   the medians are significantly different at the 5 percent level."
> > > 
> > >     TL> The only thing wrong with this is that it isn't true.
> > >     TL> The code says that the notches are +/- 1.58 IQR/sqrt(n),
> > >     TL> so I think the claimed confidence level holds only for
> > >     TL> normal distribuitons with small amounts of contamination.
> > > 
> > > I think John Tukey's idea was that this formula (or just the fact of
> > > using median and quartiles) is still often approximately correct
> > > for quite a few kinds of moderate contaminations...
> > 
> > It may be approximately correct for the width of a CI (and when I checked 
> > it was only appproximately correct for a normal), but I would seriously 
> > doubt if it were approximately correct for a significance level of 5%.
> > Remember how fast the tails of the asymptotic normal distribution decay: a 
> > 20% error turns 5% into 2%.
> > 
> > BTW, if there is a precise reference for this it would be good to add it
> > to boxplot.stats.Rd, as the confidence limits are unexplained there.
> 
> @article{McGi:Tuke:Lars:1978,
> author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
> title = {Variations of {B}ox plots},
> year = {1978},
> journal = {The American Statistician},
> volume = {32},
> pages = {12--16},
> keywords = {Exploratory data analysis; Graphics}
> }

That has the rationale.

> @book{Cham:Clev:Klei:Tuke:1983,
> author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
> and Tukey, Paul A.},
> title = {Graphical methods for data analysis},
> year = {1983},
> pages = {395},
> publisher = {Wadsworth Publishing Co Inc}
> }

That has (p.62) 1.57 not 1.58 and says non-overlap is `strong evidence' of 
a difference.

I have added appropriate references to the boxplot and boxplot.stats help 
pages.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list