[R] unexpected behavior of boxplot(x, notch=TRUE, log="y")

Ben Bolker bolker at zoo.ufl.edu
Sun Oct 8 01:40:01 CEST 2006


bogdan romocea <br44114 <at> gmail.com> writes:

> 
> A function I've been using for a while returned a surprising [to me,
> given the data] error recently:
>    Error in plot.window(xlim, ylim, log, asp, ...) :
>        Logarithmic axis must have positive limits
> 
> After some digging I realized what was going on:
> x <- c(10460.97, 10808.67, 29499.98, 1, 35818.62, 48535.59, 1, 1,
>    42512.1, 1627.39, 1, 7571.06, 21479.69, 25, 1, 16143.85, 12736.96,
>    1, 7603.63, 1, 33155.24, 1, 1, 50, 3361.78, 1, 37781.84, 1, 1,
>    1, 46492.05, 22334.88, 1, 1)
> summary(x)
> boxplot(x,notch=TRUE,log="y")  #unexpected
> boxplot(x)  #ok
> boxplot(x,log="y")  #ok
> boxplot(x,notch=TRUE)  #aha
> 

  Mick Crawley (author of several books on ecological
data analysis in R) submitted a related issue as
bug #7690, which I was mildly surprised to see
filed as "not reproducible" (I didn't have problems reproducing
it at the time ... I posted my then-patch
to R-devel at the time
https://stat.ethz.ch/pipermail/r-devel/2006-January/036257.html )  
The problem typically occurs
for very small data sets, when the notches can
be bigger than the hinges.  

  As I said then,

>  I can imagine debate about what should be done in this case --
> you could just say "don't do that", since the notches are based
> on an asymptotic argument ... the diff below just truncates
> the notches to the hinges, but produces a warning saying that the 
> notches have been truncated.

The interaction with
log="y" is new to me, though, and my old patch
didn't catch it.

   Here's my reproducible version:

set.seed(1001)
npts <- 7
X <- rnorm(2*npts,rep(c(3,4.5),each=npts),sd=1)
f <- factor(rep(1:2,each=npts))
par(mfrow=c(1,2))
boxplot(X~f,notch=TRUE)

  A possible fix is to truncate the notches
(and issue a warning) when this happens,
in src/library/grDevices/R/calc.R:

*** calc.R      2006-10-07 17:44:49.000000000 -0400
--- newcalc.R   2006-10-07 19:25:38.000000000 -0400
***************
*** 16,21 ****
--- 16,26 ----
        if(any(out[nna])) stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
      }
      conf <- if(do.conf) stats[3] + c(-1.58, 1.58) * iqr / sqrt(n)
+     if (do.conf) {
+       if (conf[1]<stats[2] || conf[2]>stats[4]) warning("confidence limits >
hinges: notches truncated")
+       conf[1] <- max(conf[1],stats[2])
+       conf[2] <- min(conf[2],stats[4])
+     }
      list(stats = stats, n = n, conf = conf,
         out = if(do.out) x[out & nna] else numeric(0))
  }



More information about the R-help mailing list