[R-sig-ME] Replicating type III anova tests for glmer/GLMM

Sun Feb 28 16:34:06 CET 2016

Dear Emmanuel,

Again, I'll respond briefly, and not in the detail that your questions really require:

> -----Original Message-----
> From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
> Sent: February 26, 2016 11:52 AM
> To: Fox, John <jfox at mcmaster.ca>
> Cc: Francesco Romano <francescobryanromano at gmail.com>; r-sig-mixed-
> models at r-project.org
> Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM
> 
> Dear Pr Fox,
> 
> Thanks for the time taken clarifying things. I'll take time to read your text,
> and think over things, but I think that until that I'll stay with the writing of the
> comparisons in terms of means and deduce the linear hypothesis to test, to
> be sure of what I do.
> 
> I don't understand well, in your answer, the part saying « it explains why it's
> possible to formulate the different types of tests in linear models
> independently of the contrasts (regressors) used to code the factors --
> because fundamentally what's important is the subspace spanned by the
> regressors in each model, which is independent of coding. ».

The subspace spanned by the regressors in a model like y ~ A*B or y ~ A + B is independent of the coding of the regressors.

> 
> As I understood the model, if we have a 2×2 design (A×B) for instance, the
> subspace spanned by all predictors is a 4-dimensionnal space. In this space,
> each dimension can be assigned to A, B, their interaction and a constant. That
> means, each predictor is associated with a different basis vector of this 4-
> dimensionnal space. But there is several ways of defining the basis, defining
> different sub-spaces associated with A, B and A×B, and this corresponds to
> the different codings. For instance, I can say (with 4 points)
> 
> µ A   B A×B       or  µ  A  B  A×B
> 1 -1 -1  +1           1  0  0  0
> 1 -1 +1  -1           0  0  1  0
> 1 +1 -1  -1	      0  1  0  0
> 1 +1 +1  +1	      0  1  1  1
> 

In both cases, models like y ~ A*B and y ~ A + B produce the same y-hat vectors and hence same SSs. The situation is a bit more complicated for models that violate marginality, but that situation can be handled by more general approaches, like estimable functions or close attention to the hypotheses tested. All tests can be formulated linear hypothesis in the parameters of the full, full-rank model, but different parametrizations make the tests simpler or more difficult.

You've shown the row-basis for the model matrix in the cases of effect ("contr.sum") coding and dummy ("contr.treatment") coding. Call the basis matrix X_B. Then, because these are full-rank parametrizations, as long as no cells are empty, you can solve for the cell means in terms of the model parameters. Call the parameter vector corresponding to the basis beta_B and the ravelled vector of cell means mu. Then mu = X_B beta_B and (because X_B is nonsingular), beta_B = X_B^-1 mu. This allows you to see the composition of each parameter in terms of cell means and thus the hypothesis tested by the (type-III) test that the parameter is 0. In the case of effect coding, the columns of X_B are orthogonal and so its inverse is particularly simple, with each row equal to a column of X_B up to a constant factor.

> and the sub-spaces associated with µ, A, B, and A×B are different in these
> two codings (but in whole, the 4-dimensionnal space is the same).  I may miss
> something trivial, but I would say that the coding instead defines the
> subspace spanned by the regressor, and not that they are independant.
> 
> Am I too stuck with coding? But then, how is defined the subspace
> associated to a regressor « absolutly »?

It's not, but because the model matrix spans the same subspace, it's possible to test the same hypotheses in full-rank formulations of the same model. One way to see that is to work backwards from beta_B = X_B^-1 mu (that is, define X_B^-1 as the contrasts that you want to test) to mu = X_B beta_B. As mentioned, this is particularly simple when the *rows* of X_B^-1 are orthogonal contrasts.

John

> 
> 
> On Wed, Feb 24, 2016 at 04:08:42PM +0000, Fox, John wrote:
> « Dear Emmanuel,
> «
> « The questions you raise are sufficiently complicated that it's difficult to
> address them adequately in an email. My Applied Regression and
> Generalized Linear Models text, for example, takes about 15 pages to explain
> the relationships among regressor codings, hypotheses, and tests in 2-way
> ANOVA, working with the full-rank parametrization of the model, and it's
> possible (as Russell Lenth indicated) to work things out even more generally.
> «
> « I'll try to answer briefly, however.
> «
> «
> « No need to apologize. I don't think that these are simple ideas.
> «
> « > the expectation of the quadratic form, and I too quickly deduced that
> there « > was an equivalent linear combination of the parameters as its «
> square root « > », but this was obviously wrong since the L matrix in a Lt W L
> quadratic form « > does not have to be a column matrix. Am I wrong thinking
> that typically in « > such tests, the L matrix is precisely a multi-column matrix
> (hence also several « > degrees of freedom associated), and that several
> contrasts are tested « > simultaneously?
> «
> « Thinking in terms of the full-rank parametrization, as used in R, each type-
> III hypothesis is that several coefficients are simultaneously 0, which can be
> simply formulated as a linear hypothesis assuming an appropriate coding of
> the regressors for a factor. Type-II hypotheses can also be formulated as
> linear hypotheses, but doing so is more complicated. The Anova() function
> uses a kind of projection, in effect defining a type-II test as the most
> powerful test of a conditional hypothesis such as no A main effect given that
> the A:B interaction is absent in the model y ~ A*B. This works both for linear
> models, where (unless there is a complication like missing cells), the resulting
> test corresponds to the test produced by comparing the models y ~ A and y ~
> A + B, using Y ~ A*B for the estimate of error variance (i.e., the denominator
> MS), and more generally for models with linear predictors, where it's in
> general possible to formulate the (Wald) tests in terms of the coefficient
> estimates and their covariance matrix.
> «
> « >
> « > I precise that I call « contrast » a linear combination of the model
> parameters « > with the constraint that the coefficients of this combination
> sum to 0 ─ this is « > the definition in French (« contraste »), but I may use it
> wrongly in English?
> «
> « I'd define a "contrast" as the weights associated with the levels of a factor
> for formulating a hypothesis, where the weights traditionally are constrained
> to sum to 0, and to differentiate this from a column of the model matrix,
> which I'd more generally term a "regressor." Often, a traditional set of
> contrasts for a factor, one less than the number of levels, are defined not
> only to sum to 0  but also to be orthogonal in the basis of the design. The
> usage in R is more general, where "contrasts" mean the set of regressors
> used to represent a factor. Thus, contr.sum() generates regressors that
> satisfy the traditional definition of contrasts, as do contr.poly() and
> contr.helmert(), but the default contr.treatment() generates 0/1 dummy-
> coded regressors that don't satisfy the traditional definition of contrasts.
> «
> « >
> « > Second, I may have wrongly understood the definitions of the various
> tests, « > and especially how they generalize from linear model to
> GLM/GLMM...
> « >
> « > I thought type I was by taking the squared distance of the successive « >
> orthogonal projections on the subspaces generated by the various terms, in
> « > the order given in the model; type II, by ensuring that the term tested
> was « > the last amongst terms of same order, after terms of lower order but
> before « > terms of higher order; type III, by projecting on the subspace
> after removal « > of the basis vectors for the term tested ─ hence its strong
> dependency on « > the coding scheme, and the « drop1 » trick to get them.
> « >
> « > Is this definition correct? Does it generalize to other kind models, or is « >
> another definition required? Is it unambiguous? The SAS doc itself suggests «
> > that various procedures call "type II" different kind of things « « Yes, if I've
> followed this correctly, it's correct, and it explains why it's possible to
> formulate the different types of tests in linear models independently of the
> contrasts (regressors) used to code the factors -- because fundamentally
> what's important is the subspace spanned by the regressors in each model,
> which is independent of coding. This approach, however, doesn't generalize
> easily beyond linear models fit by least squares. The approach taken in
> Anova() corresponds to this approach in linear models fit by least squares as
> long as the models remain full-rank and for type-III tests as long as the
> contrasts are properly formulated, and generalizes to other models with
> linear predictors.
> «
> « >
> « > However, I cannot see clearly which hypothesis is indeed tested in each
> case, « > especially in terms of cell means or marginal means (and, when I
> really need « > it, I start from them and select the contrasts I need).  Is there
> any « > package/software that allows to print the hypotheses testeds in
> terms of « > means starting from the model formula?
> « > Or is there any good reference that makes the link between the two?
> « > For instance, a demonstration that the comparison of marginal means « «
> > always » leads to a type XXX sum of square?
> «
> « This is where a complete explanation gets too lengthy for an email, but a
> shorthand formulation, e.g., for the model y ~ A*B, is that type-I tests
> correspond to the hypotheses A|(B = 0, AB = 0), B | AB = 0, AB = 0; type-II
> tests to A | AB = 0, B | AB = 0, AB = 0; and type-III tests to A = 0, B = 0, AB = 0.
> Here, e.g., | AB = 0 means assuming no AB interactions, so, e.g., the
> hypothesis A | AB = 0 means no A main effects assuming no AB interactions.
> A hypothesis like A = 0 is indeed formulated in terms of marginal means,
> understood as cell means for A averaging over the levels of B (not level
> means of A ignoring B).
> «
> « I realize that this is far from a complete explanation.
> «
> « Best,
> «  John
> 
> --
>                                 Emmanuel CURIS
>                                 emmanuel.curis at parisdescartes.fr
> 
> Page WWW: http://emmanuel.curis.online.fr/index.html