[Rd] boxplot by factor (Package base version 2.1.1) ( PR#7976)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Tue Jun 28 14:57:42 CEST 2005
"Liaw, Andy" <andy_liaw at merck.com> writes:
> The issue is not with boxplot, but with split. boxplot.formula()
> calls boxplot(split(split(mf[[response]], mf[-response]), ...),
> but look at what split() returns when there are empty levels in
> the factor:
>
> > f <- factor(gl(3, 6), levels=1:5)
> > y <- rnorm(f)
> > split(y, f)
> $"1"
> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
>
> $"2"
> [1] -1.1296642 -0.4808355 -0.2789933 0.1220718 0.1287742 -0.7573801
>
> $"3"
> [1] 1.2320902 0.5090700 -1.5508074 2.1373780 1.1681297 -0.7151561
>
> The "culprit" is the following in split.default():
>
> f <- factor(f)
>
> which drops empty levels in f, if there are any. BTW, ?split doesn't
> mention what it does in such situation. Perhaps it should?
>
> If this is to be "fixed", I suppose an additional argument, e.g.,
> drop=TRUE, can be added, and the corresponding line mentioned
> above changed to something like:
>
> if (drop || !is.factor(f)) f <- factor(f)
>
> Then this additional argument can be pass on from boxplot.formula() to
> split().
Alternatively, I suspect that the intention was as.factor() rather
than factor(). It does require a bit of care to fix it that way,
though. There could be problems with empty levels popping up in
unexpected places.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list