[Rd] Small inconsistency with boxplot
Martin Maechler
maechler at stat.math.ethz.ch
Fri Nov 18 09:44:36 CET 2011
> Dear R-core team,
> I think I found a small inconsistency in the boxplot function. I don't want to post it as a bug since I'm not sure this might be considered as one according to the FAQ --- and this is not a major problem. Don't hesitate to tell me if I'm wrong.
> If you try to do a boxplot on a matrix and set the "at" argument to some vector different from 1:n, n is the number of columns of your matrix, then some boxplots will be hidden since the default "xlim" value will be set to c(0.5, n + 0.5) during the call of the bxp function.
> Currently you can easily bypass this problem by setting "xlim" appropriately when calling the boxplot function.
Yes. And the help page for bxp even has the following note:
\note{
if \code{add = FALSE}, the default is \code{xlim = c(0.5, n +0.5)}.
It will usually be a good idea to specify the latter if the "x" axis
has a log scale or \code{at} is specified or \code{width} is far from
uniform.
}
which clearly documents the current behavior.
(and one could say also ``excuses'' the current behavior)
In this sense, there's really no bug ... ;-) and you were
very wise (or at least cautious :-) *not* to post it as bug ..
> I think it will be better if all boxplots were always shown unless the "xlim" argument is specified. (I realized this behavior when I tried to do boxplots on conditional simulations of a stochastic process ; in which case the suggested behavior might be useful.)
I do agree that such a change would be more ``logical'' i.e.,
according to "The Rule of Least Surprise"
(a good software design principle of providing a default behavior
of "least surprise" to the user).
> Here's an example
> par(mfrow = c(1, 3))
> data <- matrix(rnorm(10 * 50), 50)
> colnames(data) <- letters[1:10]
> x.pos <- seq(-10, 10, length = 10)
> boxplot(data, at = x.pos) ## only the last 5 boxplots will appear
> boxplot(data, at = 1:10) ## all boxplots will appear
> boxplot(data, at = x.pos, xlim = range(x.pos) + c(-0.5, 0.5)) ## all boxplots will be shown
> I tried to do a patch if you want to change the current behavior --- note this is my first patch ever so maybe I'm doing it wrong.
it looks good.
In the end, I would use
xlim <- range(at, finite=TRUE) + c(-0.5, 0.5)
There's one ***BIG*** question though:
How probable is it that it breaks someone else's code.
Note that boxplot() and bxp() are *REALLY* old traditional S
functions
(and for all the young guys: Boxplots where invented/proposed
by the famous John W Tukey, co-inventor of the FFT, the word
"bit"; "exploratory data analysis", etc etc.
Then (partly) at Bell Labs, who via John Chambers and
co-workers also "donated" the S language and hence R to the world !)
and therefore you can expect many many uses of boxplot() in
other code...
and hence, it could well be that some code has (probably
implicitly) *relied* on the current "more surprising" behavior.
I'd still advocate to the change the default here,
but we really have to discuss this, as a change also may have
adverse consequences.
Martin Maechler, ETH Zurich (and R Core)
> *** Downloads/R-2.14.0/src/library/graphics/R/boxplot.R Mon Oct 3 00:02:21 2011
> --- boxplot.R Thu Nov 17 23:02:45 2011
> ***************
> *** 203,209 ****
> }
> if(is.null(pars$xlim))
> ! xlim <- c(0.5, n + 0.5)
> else {
> xlim <- pars$xlim
> pars$xlim <- NULL
> --- 203,209 ----
> }
> if(is.null(pars$xlim))
> ! xlim <- c(min(at) - 0.5, max(at) + 0.5)
> else {
> xlim <- pars$xlim
> pars$xlim <- NULL
More information about the R-devel
mailing list