[Rd] Design choice of plot.design for formulas

Thaler,Thorn,LAUSANNE,Applied Mathematics Thorn.Thaler at rdls.nestle.com
Mon Mar 19 15:44:37 CET 2012

Dear all,

Today I figured out that the formula interface of plot.design is kind of
counter intuitive. Suppose the following setting

ddf <- expand.grid(a=factor(1:3), b=factor(1:3))
ddf\$y <- rnorm(9)
plot.design(y ~ a + b, data=ddf)

which does what it should do, basically printing the means for the
respective levels of the factors. I had to learn that the function does
not care at all whether I specify a variable at the LHS or the RHS of
the formula. Thus, the following commands are all equivalent

plot.design(~ y + a + b, data=ddf)
plot.design(a ~ y + b, data=ddf)
plot.design(b ~ y + a, data=ddf)

A closer look into the code revealed that the function basically looks
whether a variable is numeric or a factor. All factors are supposed to
be stratification factors, while all numerical variables are supposed to
be responses. While the former assumption makes sense, the latter is
misleading in conjunction with the formula interface:

ddf\$z <- sample(3, 9, TRUE)
plot.design(y ~ a + z, data=ddf)

In my reading that should produce a plot where a and z are regarded as
stratification factors, while y is the response. Instead the function
regards y and z as responses.

So my question: is there a particular reason why the formatting of a
variable in a data frame (factor vs. numerical) takes precedence over
the specification in the formula interface of plot.design? Is it the
case that one cannot specify multiple responses otherwise? In this case,
I was wondering whether an approach like in lattice where one can
specify multiple responses would be useful:

ddf\$y.new <- rnorm(9)
lattice:::xyplot(y + y.new ~ a, data = ddf, pch = 15)