[R] Simple question about formulae in R!?

Bert Gunter gunter.berton at gene.com
Fri Aug 10 18:40:55 CEST 2012


Sheesh! Yes.
... and in the case where B is a factor with k levels and x is
continuous, the model ~B:x yields k+1 parameters, which in default
contrasts would be a constant term, x, and k-1 interactions between x
and the corresponding k-1 "contrasts"(which they aren't really) for B.
~B*x would add the k-1 B main effect contrasts.

But to be fair, this can get complicated and model.matrix() and
friends is a very sophisticated piece of software (certainly way
beyond me). This whole discussion, of course, raises the (OT!) issue
of the widespread misuse of linear modeling by those with insufficient
background in linear algebra to understand the points S. Ellison
discusses. I won't go there other than to say I have no clue what to
do about it (and I encounter it in my own practice!).

-- Bert

On Fri, Aug 10, 2012 at 9:16 AM, S Ellison <S.Ellison at lgcgroup.com> wrote:
>> > R in general tries hard to prohibit this behavior (i.e.,  including an
>> > interaction but not the main effect). When removing a main effect and
>> > leaving the interaction, the number of parameters is not reduced by
>> > one (as would be expected) but stays the same, at least
>> > when using model.matrix:
>
> Surely this behaviour is less to do with a dislike of interactions without both main effects (which we will necessarily use if we fit a simple two-factor nested model) than the need to avoid non-uniqueness of a model fitted with too many coefficients?
> In a simple case, an intercept plus n coefficients for n factor levels gives us n+1 coefficients to find, and we only have n independent groups to estimate them from. In model matrix terms we would have one column that is a linear combination of others. For OLS normal equations that generates a zero determinant and for the numerical methods R uses the effect is the same; no useful fit. To avoid that and allow least squares fitting, R sets up the model matrix with only n-1 coefficients in addition to the intercept. As a result we end up with fewer model coefficients than we might have expected (and that annoyingly missing first level that always puzzles newcomers the first time we look at a linear model summary), but we have exactly the number of coefficients that we can estimate uniquely from the groups we have specified.
>
> S
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:22}}



More information about the R-help mailing list