[Rd] boxplot, notches, etc.
Ben Bolker
bolker at zoo.ufl.edu
Tue Oct 10 00:17:57 CEST 2006
Sorry to repost this, but it looks like it's getting
buried in r-help (originally posted October 5: my experience
says that if it hasn't been answered by then it won't be).
I wouldn't bother, but I'm worried that r-devel might be
better, *and* a previous e-mail of mine on the subject in
January also seemed to get buried.
Synopsis: boxplot notches look weird when notches
are greater than hinges ((1.58*IQR/sqrt(n)) > approx IQR).
When log="y" this causes an error. Below are several
reproducible examples, some discussion, and a patch against
calc.R.
Please feel free to say "this is just cosmetic/isn't an issue, go
away" ...
cheers
Ben Bolker
bogdan romocea <br44114 <at> gmail.com> writes:
>
> A function I've been using for a while returned a surprising [to me,
> given the data] error recently:
> Error in plot.window(xlim, ylim, log, asp, ...) :
> Logarithmic axis must have positive limits
>
> After some digging I realized what was going on:
> x <- c(10460.97, 10808.67, 29499.98, 1, 35818.62, 48535.59, 1, 1,
> 42512.1, 1627.39, 1, 7571.06, 21479.69, 25, 1, 16143.85, 12736.96,
> 1, 7603.63, 1, 33155.24, 1, 1, 50, 3361.78, 1, 37781.84, 1, 1,
> 1, 46492.05, 22334.88, 1, 1)
> summary(x)
> boxplot(x,notch=TRUE,log="y") #unexpected
> boxplot(x) #ok
> boxplot(x,log="y") #ok
> boxplot(x,notch=TRUE) #aha
>
Mick Crawley (author of several books on ecological
data analysis in R) submitted a related issue as
bug #7690, which I was mildly surprised to see
filed as "not reproducible" (I didn't have problems reproducing
it at the time ... I posted my then-patch
to R-devel at the time
https://stat.ethz.ch/pipermail/r-devel/2006-January/036257.html )
The problem typically occurs
for very small data sets, when the notches can
be bigger than the hinges.
As I said then,
> I can imagine debate about what should be done in this case --
> you could just say "don't do that", since the notches are based
> on an asymptotic argument ... the diff below just truncates
> the notches to the hinges, but produces a warning saying that the
> notches have been truncated.
The interaction with
log="y" is new to me, though, and my old patch
didn't catch it.
Here's my reproducible version:
set.seed(1001)
npts <- 7
X <- rnorm(2*npts,rep(c(3,4.5),each=npts),sd=1)
f <- factor(rep(1:2,each=npts))
par(mfrow=c(1,2))
boxplot(X~f,notch=TRUE)
A possible fix is to truncate the notches
(and issue a warning) when this happens,
in src/library/grDevices/R/calc.R:
[WATCH OUT FOR LINE WRAPPING BELOW!]
*** calc.R 2006-10-07 17:44:49.000000000 -0400
--- newcalc.R 2006-10-07 19:25:38.000000000 -0400
***************
*** 16,21 ****
--- 16,26 ----
if(any(out[nna])) stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
}
conf <- if(do.conf) stats[3] + c(-1.58, 1.58) * iqr / sqrt(n)
+ if (do.conf) {
+ if (conf[1]<stats[2] || conf[2]>stats[4]) warning("confidence
limits > hinges: notches truncated")
+ conf[1] <- max(conf[1],stats[2])
+ conf[2] <- min(conf[2],stats[4])
+ }
list(stats = stats, n = n, conf = conf,
out = if(do.out) x[out & nna] else numeric(0))
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://stat.ethz.ch/pipermail/r-devel/attachments/20061009/ca6413fe/attachment-0004.bin
More information about the R-devel
mailing list