[R-sig-ME] extracting p values for main effects of binomial glmm

Thu Mar 5 21:29:59 CET 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-03-05 03:08 PM, John Maindonald wrote:
> An issue that has not been raised so far is whether the binomial or
>  other GLM type variance takes account of all relevant observation
>  level sources of variation.  This is pertinent for all
> generalised linear mixed models.
> 
> Over-dispersed binomial or Poisson is in my experience much more 
> common (certainly, e.g. in such areas as ecology) than binomial or
>  Poisson.  The effect on the variance can be huge, multiplying by a
>  factor of 3 or 4 or more.  More generally, the multiplier may not
> be a constant factor.  This issue is most serious for comparisons
> where the relevant variances are the observation level variances.
> 
> With glmer(), one way to handle this is to fit an observation level
>  random effect; the multiplier is not then a constant factor.
> glmer() did at one time allow a constant multiplier.  I think it
> unfortunate that this was removed, as it restricted the options
> available.

  It was removed because we (Doug Bates in particular) decided that we
really didn't understand at a theoretical level what the quasi- model
was doing, and that it sometimes seemed to be producing unreasonable
outputs.  That doesn't mean it *can't* be understood at a theoretical
level -- in principle, it should act similarly to other conditional
response distributions with free scale parameters, which we do allow
(e.g. Gamma) -- but this is still on our very long list of things to try
to figure out.

> Using negative binomial errors is in principle another possibility,
> but such models can be difficult to get to converge.  If the model
> is wrong, it may well not converge, which is an obstacle to getting
> to the point where one has something half-sensible that does
> converge — this is the case for glm.nb() as well as for
> glmer.nb().

  For what it's worth glmmADMB (which operates in a general
optimization framework, not in a GLM-like IRLS mode) allows both NB2
(negative binomial parameterized in the traditional V=mu*(1+mu/k) way)
and NB1 (negative binomial parameterized so that V=phi*mu, as in the
quasi-Poisson case)

> 
> With predicted probabilities that are close to 0 or 1, or Poisson 
> means that are close to 0, the Hauck-Donner effect where the
> standard errors for Wald statistics become nonsense and
> z-statistics become smaller as the distance between means that are
> to be compared increases, is something more to worry about.

  Well in this case, just use profile confidence intervals or LRTs.  At
least it's usually pretty obvious when this is happening.

> 
> I guess the message is that this is treacherous territory, and it 
> really is necessary to know what one is doing!

  Yes; we have all of the complexities of linear models, GLMs (which
seem to be the issues you are pointing out here), and mixed models on
top of that ...

> 
> I’d like to echo Jonathon Baron’s comments on the “main effects in
>  the presence of interactions issue."
> 
> John Maindonald             email: john.maindonald at anu.edu.au 
> Wellington, NZ
> 
> On 6/03/2015, at 0:25, Henrik Singmann 
> <henrik.singmann at psychologie.uni-freiburg.de> wrote:
> 
>> Hi,
>> 
>> As others have pointed out, the issue of obtaining p-values for 
>> effects is not trivial and there are pitfalls. Nevertheless I
>> think that the easiest way to obtain what you want is to use
>> mixed(..., method = "LRT") from package afex (full disclaimer: I
>> am the main author of said package and function).
>> 
>> mixed() takes care of some issues, such as using appropriate
>> coding and how to test main effects in the presence of
>> interactions, in not uncontroversial but sensible ways (e.g., it
>> does the latter by simply removing the main effect in a
>> brute-force manner from the model-matrix and compare the full
>> with the so reduced model). The way things are handled in mixed
>> might not be correct in all cases but may provide exactly the
>> combination of ease of use with sensible defaults that may be
>> helpful in your case. In other words, it tries to adopt an
>> approach often taken by commercial statistics packages.
>> 
>> You simply call mixed instead of glmer which fits and refits 
>> various versions of the model to provide likelihood-ratio tests
>> for all effects as you want them. Note however, that mixed()
>> cannot deal with the way you specify the model using cbind() as
>> dependent variable. What you need to do is compute the
>> proportions of "successes" and pass the number of observations as
>> weights. Assuming your data resides in a data.frame df this would
>> result in something along the following lines:
>> 
>> 
>> df$prop_weights <- df$Offspring + df$Eggs_Offspring 
>> df$proportion_hatched <- df$Offspring/df$prop_weights
>> 
>> m1 <- mixed(proportion_hatched ~ Diet * Infection_status * Day + 
>> (1|SubjectID) + (1|Day), data = df, family=binomial, weights = 
>> df$proportion_weights, method = "LRT")
>> 
>> However, as Ken Beath has said, the main effects may not be 
>> sensible given interactions. To asses the degree of this threat
>> for your data (which very much much depends on the type of
>> interaction and their size) I suggest to follow the excellent
>> advice of fortune(193) to "take the unusual step of plotting the
>> data." This can be done nicely with e.g., the effects package,
>> even for complicated interactions and binomial mixed models.
>> 
>> Hope this helps, Henrik
>> 
>> 
>> Am 04.03.2015 um 21:11 schrieb Megan Kutzer:
>>> Hi,
>>> 
>>> I'm fairly new to mixed models and have done a lot of reading 
>>> without much success. Unfortunately there is no one at my 
>>> institution who is really familiar with them so I thought I
>>> would try this list.
>>> 
>>> I'm running a binomial generalized linear mixed effects model
>>> and I need p-values for the main effects. I know this isn't
>>> entirely correct with this type of model but my supervisor
>>> wants the p-values!
>>> 
>>> The model is:
>>> 
>>> glmer (Proportion hatched ~ Diet * Infection status * Day + 
>>> (1|SubjectID) + (1|Day), family=binomial)
>>> 
>>> where,
>>> 
>>> Proportion hatched = cbind(Offspring, Eggs-Offspring) Diet is
>>> a factor with 2 levels Infection status is a factor with 4
>>> levels Day is a factor with 3 levels
>>> 
>>> Using Subject ID number and Day as random effects is supposed
>>> to control for pseudoreplication in the model, although I am
>>> not entirely sure that this is specified in the correct way. I
>>> wanted to include experimental replicate here too but the model
>>> failed to converge.
>>> 
>>> My question is: is there a way to get p-values for the main
>>> fixed effects of Diet, Infection and Day?
>>> 
>>> If you need more specific model information or the model output
>>> I would be happy to provide it.
>>> 
>>> Thanks, Megan
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>> 
>> -- Dr. Henrik Singmann Universität Zürich, Schweiz 
>> http://singmann.org
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________ 
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJU+LzHAAoJEOCV5YRblxUHhoEH/3/vP0pnOgluZ9gw77eh3p//
UOPuJgv1iYViJMatg1G5Aq4LxXDduTPPDqrOgEFji+/1D/7x5DH7cYb6e6AVeZbA
jNPOVXPX66ER/8NDF5l2VT1To+anMtRZ2aKNusDS2ntdx1Xofqlzk1WShNlEstMM
fhKTGf2s1IKdZwFrXcVdb5u9a9SFYbFisy/U3lx5LmNEJn1wnyg23e709PFjSZMv
N4/BeaN8fUUYYRSS9XzGV0Q3ZKkVqE8z0idPto5LaQIMmDc04ny9DHPbg6iG4J/y
i5zCE/j/DnhI7N/LVgDuQQvlYWURpWHcvDqncj4a6jXBnKfwRIG0hzFTr4I2KWg=
=G8uO
-----END PGP SIGNATURE-----