[Rd] boxplot by factor (Package base version 2.1.1) ( PR#7976)

Tue Jun 28 14:57:42 CEST 2005

"Liaw, Andy" <andy_liaw at merck.com> writes:

> The issue is not with boxplot, but with split.  boxplot.formula() 
> calls boxplot(split(split(mf[[response]], mf[-response]), ...), 
> but look at what split() returns when there are empty levels in
> the factor:
> 
> > f <- factor(gl(3, 6), levels=1:5)
> > y <- rnorm(f)
> > split(y, f)
> $"1"
> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
> 
> $"2"
> [1] -1.1296642 -0.4808355 -0.2789933  0.1220718  0.1287742 -0.7573801
> 
> $"3"
> [1]  1.2320902  0.5090700 -1.5508074  2.1373780  1.1681297 -0.7151561
> 
> The "culprit" is the following in split.default():
> 
>     f <- factor(f)
> 
> which drops empty levels in f, if there are any.  BTW, ?split doesn't
> mention what it does in such situation.  Perhaps it should?
> 
> If this is to be "fixed", I suppose an additional argument, e.g.,
> drop=TRUE, can be added, and the corresponding line mentioned
> above changed to something like:
> 
>     if (drop || !is.factor(f)) f <- factor(f)
> 
> Then this additional argument can be pass on from boxplot.formula() to 
> split().

Alternatively, I suspect that the intention was as.factor() rather
than factor(). It does require a bit of care to fix it that way,
though. There could be problems with empty levels popping up in
unexpected places. 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907