[Rd] box and whisker (PR#13821)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Sat Jul 18 10:51:17 CEST 2009
>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>> on Sun, 12 Jul 2009 11:11:37 +0200 writes:
PD> m.crawley at imperial.ac.uk wrote:
>> In a Box and Whisker plot, I thought that when there are outliers both abov=
>> e and below the whiskers, then the whiskers should both be the same length =
>> (plus or minus 1.5 times the inter-quartile range).
PD> Not according to the docs:
PD> range: this determines how far the plot whiskers extend out from the
PD> box. If 'range' is positive, the whiskers extend to the most
PD> extreme data point which is no more than 'range' times the
PD> interquartile range from the box. A value of zero causes the
PD> whiskers to extend to the data extremes.
PD> And the code itself has
PD> stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
PD> So the whisker won't be equal to 1.5 IQR unless there happens to be an
PD> observation there.
PD> Now, this might be wrong, but people have tried very hard to make the
PD> implementation follow the original definition due to Tukey. I.e., if you
PD> can point out that Tukey specified it otherwise, then we'd change it,
PD> otherwise it is just not a bug.
I'd bet pretty large amounts that we (and S and S-plus probably
quite few otherpackages) have implemented the whiskers the way
JWT defined them, very purposefully.
One of JWT's point *was* exactly that most of the values "drawn"
represent *observations* (and those that do not use
exact mid points of obs.):
It's not by coincidence or even queerness that the box is *not*
delineated by the usual quartiles, but rather the *hinges*
[ Digression about hinges vs quartiles :
?boxplot.stats
has a section 'Details' to which I had added such information about
decade ago.
Whereas our R help pages ( ?boxplot.stats, ?fivenum )
do use the correct definitions,
unfortunately many other places do *not*, e.g., even the
Wikipedia page http://en.wikipedia.org/wiki/Five-number_summary
wrongly talks about 1st and 3rd quartile,
but then at least uses a numerical example using the hinges
]
Martin Maechler, ETH Zurich
>> If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
>> see that for November (month =3D 11) the upper whisker is shorter than the =
>> lower, while for other months with outliers both above and below, the lines=
>> are the same lengths.
PD> For easier reproduction (reproducible examples should not refer to files
PD> on your C: drive...):
>> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
PD> [,1]
PD> [1,] 1.2525857
PD> [2,] 0.5412128
PD> [3,] 0.6083348
PD> [4,] 1.4625057
PD> --
PD> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
PD> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
PD> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
PD> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
PD> ______________________________________________
PD> R-devel at r-project.org mailing list
PD> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list