# [R] boxplot notches

Frank E Harrell Jr feh3k at spamcop.net
Tue Mar 2 18:31:33 CET 2004

```On Tue, 2 Mar 2004 07:24:41 -0800 (PST)
Thomas Lumley <tlumley at u.washington.edu> wrote:

> On Tue, 2 Mar 2004, P. B. Pynsent wrote:
>
> > A Google search showed  that all this was discussed in April 1988 with
> > an extensive reply to the question from M Maechler.
> > I, as a non-statistician, blindly believed what was written in the
> > boxplot() help file, I am sure many would be grateful to this help
> > being modified.
> >
> > I still do not understand why , 6 years later with GHz processors,
> > boxplot() could not have an option to produce exact intervals. After
> > all,  a range option is offered for the whiskers.
> > At least then non-overlapping notches would have some meaning,
> > wouldn't they?
>
> Well, they would have *some* meaning, but it would be hard to say
> exactly what. There isn't an exact confidence interval for  the
> difference in medians, so you can't find a level for two confidence
> intervals that corresponds to a specified level for the test of equality
> of medians.
>
> 	-thomas

I have been using the following approximation to get a confidence interval
for the difference in two medians.

1. Compute the nonparametric confidence interval for each median (which
selects 2 order statistics)

2. Solve for the standard error that, using the normal approximation,
would yield the same confidence interval width as the nonparametric
interval

3. For the confidence limits for the difference use a normal approximation
with a standard error equal to the square root of the sum of squares of
the standard errors computed in step 2

The S code for steps 1 and 2 is:

y <- sort(y[!is.na(y)])
n <- length(y)
r <- pmin(qbinom(c(.025,.975), n, .5) + 1, n)  ## Exact 0.95 C.L.
w <- y[r[2]] - y[r[1]]                         ## Width of C.L.
var.med <- ((w/1.96)^2)/4      ## Approximate variance of median

-Frank Harrell

```