[R] help understanding box plots

David James dj at research.bell-labs.com
Tue Feb 26 14:13:46 CET 2002


Agustin Lobo wrote:
> Gracias!
> 
> This is really close. I'm just missing the "notches", which meaning
> is explained in the help page 
> ("If the notches of two plots do not overlap then the
>           medians are significantly different at the 5 percent level.")
> but not their actual definition. 
> 
> I presume that they are calculated using a t-value but taking the median
> and m.a.d. instead of the mean and var, but this is nowhere specified.
> With skewed distributions nothes often
> make funny plots.

Right, and they look funny when n is small.  The formula 
is Median ± 1.57*IQR/sqrt(n)

> 
> An example using your function, i.e., bx.example(y=rnorm(300)+runif(300))
> should be in the help page! Why not sending it to rhelp?
> 
> Agus
> 
> Dr. Agustin Lobo
> Instituto de Ciencias de la Tierra (CSIC)
> Lluis Sole Sabaris s/n
> 08028 Barcelona SPAIN
> tel 34 93409 5410
> fax 34 93411 0012
> alobo at ija.csic.es
> 
> 
> On Fri, 22 Feb 2002, David James wrote:
> 
> > Hola!
> > 
> > The function below does something close to that.  
> > 
> > "bp.example" <- 
> > function(y, xlab = "Fraction (p)", ylab = "Quantiles Q(p)", ...)
> > {
> >    ##
> >    ## adapted from Bill Cleveland's Visulizing Data (1993)
> >    ##
> >    split.screen(fig = rbind(c(0,2/3, 0, 1), c(2/3, 1, 0, 1)))
> >    screen(1)
> >    par(cex=1); par(mar=c(5.1, 4.1, 2, 1))
> > 
> >    q <- quantile(y, c(0.25, 0.5, 0.75))
> >    y <- sort(y)
> >    p <- ppoints(y)
> >    iq <- q[3] - q[1]
> >    bxp.adj<- q[c(1,3)] + c(-1.5, 1.5)*iq 
> >    lower.adj<- min(y[y>=bxp.adj[1]])
> >    upper.adj<- max(y[y<=bxp.adj[2]])
> > 
> >    plot(p, y, xlab=xlab, ylab=ylab, ...)
> >    u <- par("usr")
> >    b <- (q[3]-q[1])/.50
> >    a <- q[1] - .25 * b
> >    abline(a,b)
> >    abline(h=q, col = 2, lty=2)
> >    abline(h=c(lower.adj, upper.adj), col = 3, lty=2)
> >    cxy <- par("cxy")
> > 
> >    # lower right annotations
> >    text(x = u[c(2,2)] - cxy[1], 
> >         y = c(lower.adj,q[1]) + cxy[2]/2,
> >         c("lower adjecent", "lower quartile (Q1)"), adj=1)
> >    text(x = u[2]-cxy[1], lower.adj- 0.75*cxy[2], 
> >         'min(y[y >= Q1 - 1.5 * IQR])', adj=1)
> > 
> >    # upper left annotations
> >    text(x = u[c(1,1,1)] + cxy[1], 
> >         y = c(q[-1], upper.adj) + cxy[2]/2,
> >         c("median (Q2)", "upper quartile (Q3)", "upper adjecent"), adj=0)
> >    text(x = u[1]+cxy[1], y=upper.adj-0.75*cxy[2],
> >         'max(y[ y <= Q3 + 1.5 * IQR])', adj=0)
> >    invisible(list(x = p, y = y, a = a, b = b))
> > 
> >    screen(2)
> >    ylim <- range(y)
> >    par(cex=1); par(mar=c(5.1, 1.0, 2, 1))
> >    boxplot(y, ylim = ylim, axes=F, col=2)
> >    axis(2, labels=F) 
> >    box()
> >    close.screen(all=T)
> > }
> > 
> > Agustin Lobo wrote:
> > > 
> > > I've always thought that it would most useful having a
> > > graphic example of boxplot including some text 
> > > pointing to the main features of the boxplot
> > > and that would define and explain these 
> > > features. Perhaps this
> > > could be made a simple function (i.e., boxplot.example())
> > > and this function be included in the help entry. Then
> > > the user would just run boxplot.example() to
> > > see a graphic and commented example. It's more
> > > dificult to understand a text describing
> > > the boxplot function than just seeing a commented
> > > graphic example.
> > > 
> > > 
> > > Agus
> > > 
> > > 
> > > Dr. Agustin Lobo
> > > Instituto de Ciencias de la Tierra (CSIC)
> > > Lluis Sole Sabaris s/n
> > > 08028 Barcelona SPAIN
> > > tel 34 93409 5410
> > > fax 34 93411 0012
> > > alobo at ija.csic.es
> > > 
> > > 
> > > On 22 Feb 2002, Peter Dalgaard BSA wrote:
> > > 
> > > > Jay Pfaffman <pfaffman at relaxpc.com> writes:
> > > > 
> > > > > Another naive stats question.  I'm trying to better understand what
> > > > > boxplots are telling me.  
> > > > > 
> > > > > I think what I see is the median and the boundaries of the 1st and 3rd
> > > > > quartiles.  The whiskers represent the range of the data unless there
> > > > > are points which are outside "range" (default: 1.5) times the distance
> > > > > from the median to that quartile.  Is that right? 
> > > > 
> > > > Not quite. 1.5 times the length of the entire box.
> > > > 
> > > > > I've read the
> > > > > documentation for boxplot numerous times, but don't quite understand
> > > > > it well enough to communicate it to my professor who's helping me with
> > > > > this project.  (You'll be relieved to know that neither of us fancies
> > > > > ourself a statistician!)
> > > > 
> > > > boxplot.stats.Rd had a typo and got updated recently in the
> > > > development and patch versions to read
> > > > 
> > > >   \item{coef}{this determines how far the plot ``whiskers'' extend out
> > > >     from the box.  If \code{coef} is positive, the whiskers extend to
> > > >     the
> > > >     most extreme data point which is no more than \code{coef} times
> > > >     the length of the box away from the box. A value of zero causes
> > > >     the whiskers
> > > >     to extend to the data extremes (and no outliers be returned).}
> > > > 
> > > > (for some reason this hasn't yet found its way to the online snapshot
> > > > manuals in http://stat.ethz.ch/R-alpha/R-devel/doc/html/ and friends.
> > > > Martin?)
> > > > 
> > > > 
> > > > > V&R (p. 122) claims that the hinges are "roughly quartiles," so
> > > > > perhaps my naive understanding is close enough.
> > > > 
> > > > Yes. The exact definition is slightly peculiar, but in compliance with
> > > > the original definition by Tukey. So I'm told, anyway.
> > > > 
> > > > 
> > > > > I've got a relatively small data set (n~=12).  I think it would help
> > > > > to see the data points plotted on top of the boxplots.  Here's what
> > > > > I'm doing now:
> > > > > 
> > > > >     par(las=2,ps=14,mar=c(15, 4, 4, 2))
> > > > >     boxplot(split(ranks,c(1:25)), names=items, notch=T, horizontal=F, add=F)
> > > > > 
> > > > > If I could get the points of each of the 25 variables plotted on top
> > > > > of the box, that'd be great.
> > > > 
> > > > Not sure what you're doing there, but maybe some code like this could
> > > > help:
> > > > 
> > > >  x1<-rnorm(20)
> > > >  x2<-rnorm(20)
> > > >  boxplot(list(x1=x1,x2=x2))
> > > >  points(cbind(1,x1))
> > > >  points(cbind(2,x2))
> > > > 
> > > > 
> > > > -- 
> > > >    O__  ---- Peter Dalgaard             Blegdamsvej 3  
> > > >   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
> > > >  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
> > > > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
> > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > > > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > Send "info", "help", or "[un]subscribe"
> > > > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> > > > 
> > > 
> > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > Send "info", "help", or "[un]subscribe"
> > > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> > 
> > -- 
> > David A. James
> > Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
> > Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
> > Murray Hill, NJ 09794-0636
> > 
> 

-- 
David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list