[R] boxplots of 1 (or 2 or 3) "datum" ..

Wed Mar 15 12:26:47 CET 2000

>>>>> "Dan" == Dan E Kelley <kelley at Phys.Ocean.Dal.CA> writes:

    Dan> Q: When R does 'plot()' in a context that yields boxplots, is there a
    Dan> way to force it to draw something even if there are only 1 or two data
    Dan> in the category?  I'd like for it to draw the data, perhaps using the
    Dan> outlier symbols.  My code is (*** marks the line in question) is the
    Dan> following, for R-1.0.0:

    Dan> d <- read.table("nserc-results-pgsb", header=FALSE, 
    Dan>                 col.names=c("name","dept","rank","accept"))
    Dan> # These data look like:
    Dan> #   First.Student   Some.Department     1  1
    Dan> #   Second.Student  Another.Department  2  1
    Dan> #   Third.Student   Another.Department  3  0
but contain more than just three observations, right ?

    Dan> attach(d)
    Dan> rank.inv <- 1/rank
    Dan> ll <- lm(accept ~ rank.inv + dept, data=d)
    Dan> print(summary(ll))
    Dan> print(anova(ll))
    Dan> plot(dept,resid(ll))	# makes boxplots ***

    Dan> Actually, if anybody has a bright idea how I should analyse such data,
    Dan> I'd love to hear it.  As you can see in the above, I transformed to
    Dan> 1/rank since our committee recorded high 'rank' values for students we
    Dan> favoured.  It's not clear to me how to compare rankings to boolean
    Dan> (accept/deny) results, so the 'lm()' above might be silly.

I have misunderstood you completely..
Problem is I cannot repeat your example, since you didn't use "public" data.
(Usually, you'd construct data, something like
	 d <- data.frame(accept = rbinom(100, size=1, pr = .4),
	                 rank = sample(1:100),
			 dept = gl(5, 20))
)
Are you discussing the boxplots that are produced with only 1 or 2
observations per group?

Here are boxplots for n=1, 2, 3, and 4 obs. per group.
What's wrong with these ?

   do.call("boxplot", lapply(1:4,seq))
   title("Boxplot()s of very few points")

*Or* are you suggesting that for n=1, n=2 (and maybe n=3) per group
        plot(factor, continuous)
shouldn't use boxplot()s but rather dot plots ?
This is a suggestion that I've heard and had myself before,
very well worth discussing.

- How should the decision  boxplot / dotplot be made, just depend on n?
  Wouldn't one want the box + the single observations, e.g. when in
  one group n = 3, but in all other groups n ~= 20 (which would make
					     boxplots there in any case)?
- (When) should jittering be used ?

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._