[R-sig-ME] Replicating type III anova tests for glmer/GLMM

Fri Feb 26 17:52:04 CET 2016

Dear Pr Fox,

Thanks for the time taken clarifying things. I'll take time to read
your text, and think over things, but I think that until that I'll
stay with the writing of the comparisons in terms of means and deduce
the linear hypothesis to test, to be sure of what I do.

I don't understand well, in your answer, the part saying « it explains
why it's possible to formulate the different types of tests in linear
models independently of the contrasts (regressors) used to code the
factors -- because fundamentally what's important is the subspace
spanned by the regressors in each model, which is independent of
coding. ».

As I understood the model, if we have a 2×2 design (A×B) for instance,
the subspace spanned by all predictors is a 4-dimensionnal space. In
this space, each dimension can be assigned to A, B, their interaction
and a constant. That means, each predictor is associated with a
different basis vector of this 4-dimensionnal space. But there is
several ways of defining the basis, defining different sub-spaces
associated with A, B and A×B, and this corresponds to the different
codings. For instance, I can say (with 4 points)

µ A   B A×B       or  µ  A  B  A×B
1 -1 -1  +1           1  0  0  0
1 -1 +1  -1           0  0  1  0
1 +1 -1  -1	      0  1  0  0
1 +1 +1  +1	      0  1  1  1

and the sub-spaces associated with µ, A, B, and A×B are different in
these two codings (but in whole, the 4-dimensionnal space is the
same).  I may miss something trivial, but I would say that the coding
instead defines the subspace spanned by the regressor, and not that
they are independant.

Am I too stuck with coding? But then, how is defined the subspace
associated to a regressor « absolutly »?

On Wed, Feb 24, 2016 at 04:08:42PM +0000, Fox, John wrote:
« Dear Emmanuel,
« 
« The questions you raise are sufficiently complicated that it's difficult to address them adequately in an email. My Applied Regression and Generalized Linear Models text, for example, takes about 15 pages to explain the relationships among regressor codings, hypotheses, and tests in 2-way ANOVA, working with the full-rank parametrization of the model, and it's possible (as Russell Lenth indicated) to work things out even more generally. 
« 
« I'll try to answer briefly, however.
« 
« 
« No need to apologize. I don't think that these are simple ideas.
« 
« > the expectation of the quadratic form, and I too quickly deduced that there
« > was an equivalent linear combination of the parameters as its « square root
« > », but this was obviously wrong since the L matrix in a Lt W L quadratic form
« > does not have to be a column matrix. Am I wrong thinking that typically in
« > such tests, the L matrix is precisely a multi-column matrix (hence also several
« > degrees of freedom associated), and that several contrasts are tested
« > simultaneously?
« 
« Thinking in terms of the full-rank parametrization, as used in R, each type-III hypothesis is that several coefficients are simultaneously 0, which can be simply formulated as a linear hypothesis assuming an appropriate coding of the regressors for a factor. Type-II hypotheses can also be formulated as linear hypotheses, but doing so is more complicated. The Anova() function uses a kind of projection, in effect defining a type-II test as the most powerful test of a conditional hypothesis such as no A main effect given that the A:B interaction is absent in the model y ~ A*B. This works both for linear models, where (unless there is a complication like missing cells), the resulting test corresponds to the test produced by comparing the models y ~ A and y ~ A + B, using Y ~ A*B for the estimate of error variance (i.e., the denominator MS), and more generally for models with linear predictors, where it's in general possible to formulate the (Wald) tests in terms of the coefficient estimates and their covariance matrix.
« 
« > 
« > I precise that I call « contrast » a linear combination of the model parameters
« > with the constraint that the coefficients of this combination sum to 0 ─ this is
« > the definition in French (« contraste »), but I may use it wrongly in English?
« 
« I'd define a "contrast" as the weights associated with the levels of a factor for formulating a hypothesis, where the weights traditionally are constrained to sum to 0, and to differentiate this from a column of the model matrix, which I'd more generally term a "regressor." Often, a traditional set of contrasts for a factor, one less than the number of levels, are defined not only to sum to 0  but also to be orthogonal in the basis of the design. The usage in R is more general, where "contrasts" mean the set of regressors used to represent a factor. Thus, contr.sum() generates regressors that satisfy the traditional definition of contrasts, as do contr.poly() and contr.helmert(), but the default contr.treatment() generates 0/1 dummy-coded regressors that don't satisfy the traditional definition of contrasts.
« 
« > 
« > Second, I may have wrongly understood the definitions of the various tests,
« > and especially how they generalize from linear model to GLM/GLMM...
« > 
« > I thought type I was by taking the squared distance of the successive
« > orthogonal projections on the subspaces generated by the various terms, in
« > the order given in the model; type II, by ensuring that the term tested was
« > the last amongst terms of same order, after terms of lower order but before
« > terms of higher order; type III, by projecting on the subspace after removal
« > of the basis vectors for the term tested ─ hence its strong dependency on
« > the coding scheme, and the « drop1 » trick to get them.
« > 
« > Is this definition correct? Does it generalize to other kind models, or is
« > another definition required? Is it unambiguous? The SAS doc itself suggests
« > that various procedures call "type II" different kind of things
« 
« Yes, if I've followed this correctly, it's correct, and it explains why it's possible to formulate the different types of tests in linear models independently of the contrasts (regressors) used to code the factors -- because fundamentally what's important is the subspace spanned by the regressors in each model, which is independent of coding. This approach, however, doesn't generalize easily beyond linear models fit by least squares. The approach taken in Anova() corresponds to this approach in linear models fit by least squares as long as the models remain full-rank and for type-III tests as long as the contrasts are properly formulated, and generalizes to other models with linear predictors.
« 
« > 
« > However, I cannot see clearly which hypothesis is indeed tested in each case,
« > especially in terms of cell means or marginal means (and, when I really need
« > it, I start from them and select the contrasts I need).  Is there any
« > package/software that allows to print the hypotheses testeds in terms of
« > means starting from the model formula?
« > Or is there any good reference that makes the link between the two?
« > For instance, a demonstration that the comparison of marginal means «
« > always » leads to a type XXX sum of square?
« 
« This is where a complete explanation gets too lengthy for an email, but a shorthand formulation, e.g., for the model y ~ A*B, is that type-I tests correspond to the hypotheses A|(B = 0, AB = 0), B | AB = 0, AB = 0; type-II tests to A | AB = 0, B | AB = 0, AB = 0; and type-III tests to A = 0, B = 0, AB = 0. Here, e.g., | AB = 0 means assuming no AB interactions, so, e.g., the hypothesis A | AB = 0 means no A main effects assuming no AB interactions. A hypothesis like A = 0 is indeed formulated in terms of marginal means, understood as cell means for A averaging over the levels of B (not level means of A ignoring B).
« 
« I realize that this is far from a complete explanation.
« 
« Best,
«  John

-- 
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html