[R] help understanding box plots
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Fri Feb 22 11:52:54 CET 2002
Jay Pfaffman <pfaffman at relaxpc.com> writes:
> Another naive stats question. I'm trying to better understand what
> boxplots are telling me.
>
> I think what I see is the median and the boundaries of the 1st and 3rd
> quartiles. The whiskers represent the range of the data unless there
> are points which are outside "range" (default: 1.5) times the distance
> from the median to that quartile. Is that right?
Not quite. 1.5 times the length of the entire box.
> I've read the
> documentation for boxplot numerous times, but don't quite understand
> it well enough to communicate it to my professor who's helping me with
> this project. (You'll be relieved to know that neither of us fancies
> ourself a statistician!)
boxplot.stats.Rd had a typo and got updated recently in the
development and patch versions to read
\item{coef}{this determines how far the plot ``whiskers'' extend out
from the box. If \code{coef} is positive, the whiskers extend to
the
most extreme data point which is no more than \code{coef} times
the length of the box away from the box. A value of zero causes
the whiskers
to extend to the data extremes (and no outliers be returned).}
(for some reason this hasn't yet found its way to the online snapshot
manuals in http://stat.ethz.ch/R-alpha/R-devel/doc/html/ and friends.
Martin?)
> V&R (p. 122) claims that the hinges are "roughly quartiles," so
> perhaps my naive understanding is close enough.
Yes. The exact definition is slightly peculiar, but in compliance with
the original definition by Tukey. So I'm told, anyway.
> I've got a relatively small data set (n~=12). I think it would help
> to see the data points plotted on top of the boxplots. Here's what
> I'm doing now:
>
> par(las=2,ps=14,mar=c(15, 4, 4, 2))
> boxplot(split(ranks,c(1:25)), names=items, notch=T, horizontal=F, add=F)
>
> If I could get the points of each of the 25 variables plotted on top
> of the box, that'd be great.
Not sure what you're doing there, but maybe some code like this could
help:
x1<-rnorm(20)
x2<-rnorm(20)
boxplot(list(x1=x1,x2=x2))
points(cbind(1,x1))
points(cbind(2,x2))
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list