[R-sig-ME] compare fit of GLMM with different link/family

Tue Jan 25 18:56:15 CET 2022

On 25/1/22 11:04 am, Dries Debeer via R-sig-mixed-models wrote:
> Dear,
> 
> 
> I have a question about comparing the fit of GLMM with different link functions/families.
> 
> For instance, can the deviance or the AIC be used to compare the fit of probit and logit with the same parametrization?
> 
> probit_model <- glmer(Y ~ A + B + C*D + (A | subjects), data = data, family = binomial(link = "probit"))
> logit_model <- glmer(Y ~ A + B + C*D + (A | subjects), data = data, family = binomial(link = "logit"))

This is a surprisingly tough question, in my opinion. Neither the AIC
nor the deviance depend on the link itself, so in theory, you could
compare them ... but these models are not nested, and comparing
non-nested models is generally a tricky problem.  That said, probit and
logit models will tend to give very similar results in terms of
predictions/fit to the data. The bigger difference is how you interpret
coefficients, so I would chose between probit and logit based on desired
interpretation.

For other families/links, the comparison can get even more difficult.
For example, if you compare an inverse link with an identity link, then
you are comparing two very different albeit related quantities -- like
comparing a model of "speed" vs "time".

> 
> 
> And is this also possible when the distributional assumptions are different? For instance:
> 
> gamma_model <- glmer(X ~ A + B + C*D + (A | subjects), data = data, family = Gamma(link = "inverse"))
> inverse_gauss <- glmer(X ~ A + B + C*D + (A | subjects), data = data, family = inverse.gaussian(link = "1/mu^2"))

Not really, no. Both the deviance and the AIC are functions of the log
likelihood and the choice of family corresponds to a choice of
likelihood, so you're comparing different things.

Depending on what you're going for, looking at predictive power of the
models directly -- such as looking at mean squared or mean absolute
error computed with cross validation -- might work.

That said, the choice of family is a statement about your assumptions
and prior beliefs about the data. In a Bayesian context, McElreath has
described this as a "prior about the data" in Statistical Rethinking.
Gelman et al have also noted that the prior can only be understood in
the context of the likelihood -- all hinting at the core idea here,
namely that the family is an assumption about the conditional
distribution of your data (or equivalently, about the the distribution
of the error/noise in your data).

My previous point about the choice of link changing interpretation also
holds for changes in link accompanying changes in family -- the
statements you can make about your data based on an inverse link vs an
inverse square link are different.

I would be happy to hear other opinions here.

Hope that helps,
Phillip

> 
> Thank you!
> Dries Debeer
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>