[Rd] droplevels: drops contrasts as well

Thaler, Thorn, LAUSANNE, Applied Mathematics Thorn.Thaler at rdls.nestle.com
Tue Oct 25 09:26:05 CEST 2011


> > I think this behaviour is annoying, because if one does not look
> > carefully enough, one looses the contrasts silently. Hence may I
> suggest
> > to change the code of droplevels to something like the following:
> 
> This silently changes the contrasts -- eg, if the first level of the
> factor is one of the empty levels, the reference level used by
> contr.treatment() will change.  Also, if the contrasts are a matrix
> rather than specifying a contrast function, the matrix will be invalid
> for the the new factor.

Well, you are right and while I'm not so much concerned about the first
issue you've outlined (the change in the baseline - I think if I decide
to drop unused levels, I'm aware that a non-existing level cannot be the
baseline any more), the second point is clearly an issue I've
overlooked. 

> I think just having a warning would be better -- in general it's not
> clear what (if anything) it means to have the same contrasts on
> factors with different numbers of levels.

Would be an option. I think this should be the minimum. Still, I think a
behaviour like:
1.) if contrasts are defined as matrix issue a warning and use default
contrasts (that is nothing changes as compared to now, but that a
warning is issued)
2.) if the contrasts are defined as a function, use the function for
re-computing the contrasts.

would be more desirable, as contrasts can be seen as a general setting
of how coefficients should be interpreted too (e.g. for a balanced data
set with sum "contrasts", the intercept corresponds to the overall mean,
beta1 to the difference of the overall mean and group 1 and so on),
rather than looking at them from the literal point of view (e.g. "I want
to compare level A vs level B & C"). While from the latter point of view
I agree that the same contrasts on factors with different numbers of
levels are not really meaningful, I still see the benefit if I take the
other point of view: If I drop a level, I may be still interested in
comparing the overall mean with the group means bearing in mind that
maybe some groups are not present any more in the data set.

Do you see my point? However, it is not the biggest issue, as one can
change the contrasts rather easily oneself, but I think at least some
information/warning should be issued that the old contrasts are not used
any more.


KR,

-Thorn


> 
> 
>    -thomas
> 
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland



More information about the R-devel mailing list