[R-sig-ME] compare fit of GLMM with different link/family

Thu Jan 27 07:09:14 CET 2022

Oh, have you been reading the Lo and Andrews paper
(https://doi.org/10.3389/fpsyg.2015.01171)? I've developed a little bit
of a reputation for my skepticism regarding that paper. :)

(On a purely practical point, I'll note that that paper discusses using
a gamma family model with identity link, but those often have
convergence issues in my experience, even beyond my philosophical points
below.)

To refine my previous comments a bit: I think AIC is as good as anything
else for comparing non-nested models, but that it's generally just very
hard to compare non-nested models without a domain- or even
problem-specific notion about what's "better". For nested models, we
can often ignore the difficulty of "how much better is a meaningful
difference?" by using the likelihood-ratio test and the conventional
significance framework, but that's not without problems. In other words,
my whole skepticism here is a mixture of "what do we mean by 'better'?"
and "are we doing anything that my inner mathematician would be
uncomfortable with?".

There is a fair amount of work looking for the "ideal" model of response
time -- what family, what link function, should we transform the
response time (e.g. log transform), etc. For me, it really depends on
what your hypothesis is like -- each of these choices corresponds to
different hypotheses and different assumptions. For example, if you
assume that your experimental manipulation will have multiplicative
effects, then log-transforming your responses times seems reasonable
because additive effects on the log scale correspond to multiplicative
effects on the original scale. But this goes both ways: if you're
log-transforming "just" to address skew (e.g. the long right tail you
often see in RT experiments), then you're still changing the precise
hypothesis being tested. If you're using a non-identity link function,
then there is still a transformation going on, just in a different place
(which impacts whether the residual error is also transformed).
Similarly, your choice of model family reflects both an assumption about
the general shape of the conditional (~error) distribution and how you
weigh errors. (This point also ties into the use of e.g. the Student-t
distribution as a model family in robust statistics.)

To bring it back around, the questions I would ask myself are:
- What model structure best encodes my hypotheses?
- After fitting that model, does that model capture the overall
structure of my data? If not, then that's already a rejection of the
exact formulation of my hypotheses, but it might still be good to some
explorative work looking at other model formulations as a way to update
my hypotheses for the next experiment.

This is also where people worry about violating model assumptions --
whether a particular violation is bad depends on what you're looking at.
For example, heavier than normal tails will make your standard errors a
bit misleading, but usually won't mess up your estimates too much. But a
strong skew might mean that your estimates aren't good representations
of your data -- much in the same that the mean is often not a good
summary of strongly skewed data (e.g., mean vs. median income as
measures of typical incomes).

If's not already very clear, I often think of models as quantitative
summaries of data. A good summary depends on pulling out the important
bits and the important bits are always dependent on your ultimate
(inferential) goals. :)

Also, since you're looking at RT, two more specific hints/tips. :)
- Reinhold Kliegl has pointed me towards using speed (i.e. 1/RT) instead
of reaction time (I think he's basing this on Box-Cox transformations),
and I've often been quite happily surprised by how well this works. The
inversion handles the long tails nicely and speed is easier to interpret
than log RT. And we often speak of things in terms of speed anyway,
e.g., we expect participants to be faster in one condition than another.

- Check out Jonas Lindeløv's great write-up on RT distribution:
http://lindeloev.net/shiny/rt/. It also really covers a few different
types of common hypotheses and how these can appear as distributions. I
would also like to note that I've had some RT experiments where the RT
was -- to the disbelief of the reviewers -- well represented by a normal
distribution when the right covariates were included. :)

I hope that helps!
Best,
Phillip

On 26/1/22 1:29 am, Dries Debeer via R-sig-mixed-models wrote:
> Dear Ben and Philliip, 
> 
> Thank you for your response and the pointers to the discussion! 
> 
> The reason for asking was finding the best fitting model for response times in an experimental setting. I agree that theoretical/scientific arguments for choosing models can outweigh purely maximizing the fit.
> 
> Dries
> 
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On
>> Behalf Of Ben Bolker
>> Sent: woensdag 26 januari 2022 3:09
>> To: r-sig-mixed-models using r-project.org
>> Subject: Re: [R-sig-ME] compare fit of GLMM with different link/family
>>
>>    I mostly agree.
>>
>>    I would say that in general it's OK to compare models with different links,
>> families, etc. via AIC *as long as you don't explicitly transform the response
>> variable* -- i.e. you have to be careful comparing
>>
>>   lm(log(y) ~ ....)
>>
>> with
>>
>>    lm(y ~ ...)
>>
>> (you need a Jacobian term in the AIC expression to account for the change in
>> scaling of the density), but comparing basically
>>
>>    glm(y ~ ... , family = <anything>)
>>
>> should be OK. That said, there is a strong minority view (Phillip may belong to
>> this group) that says that using AIC to compare non-nested models is *not*
>> OK: e.g. see
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstats.
>> stackexchange.com%2Fquestions%2F116935%2Fcomparing-non-nested-
>> models-with-
>> aic%2F116951%23116951&data=04%7C01%7CDries.Debeer%40ugent.be
>> %7Cc4b9f930b96b4395259808d9e070fd71%7Cd7811cdeecef496c8f91a178624
>> 1b99c%7C1%7C0%7C637787598905000528%7CUnknown%7CTWFpbGZsb3d8e
>> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
>> %7C3000&sdata=5hbi6AF92Duw953axcH%2FMBHtjd9piahD7lk5mUdSH
>> wU%3D&reserved=0
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmath
>> overflow.net%2Fquestions%2F249448%2Fuse-of-akaike-information-
>> criterion-with-nonnested-
>> models&data=04%7C01%7CDries.Debeer%40ugent.be%7Cc4b9f930b96
>> b4395259808d9e070fd71%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0
>> %7C637787598905000528%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&s
>> data=MmI5p%2Bq8Is9MwO86Jtp2q5Aq0KsAspYIwpLMWlBud%2F8%3D&am
>> p;reserved=0
>>
>>   (Unfortunately, really understanding why this should or should not work
>> depends, I think, on understanding the rates of convergence of certain
>> asymptotic expressions ...)
>>
>>    I completely agree with Phillip on the rest, though, which is to say that you
>> should think about **why** you want to test all these different cases. It's
>> unlikely you're going to be able to frame *scientific* hypotheses in terms of
>> these different models ("is it better to measure consumption in gallons per
>> mile or miles per gallon?"). If you're purely interested in prediction, then I
>> think AIC will often be an adequate approximation to something based on
>> cross-validation (but it would be good to check with CV). On the other hand,
>> if you're purely interested in prediction you might want to move in the
>> direction of nonparametric models such as GAMs, which should make many
>> of the distinctions between links irrelevant ...
>>
>>
>>
>>
>> On 1/25/22 12:56 PM, Phillip Alday wrote:
>>>
>>> On 25/1/22 11:04 am, Dries Debeer via R-sig-mixed-models wrote:
>>>> Dear,
>>>>
>>>>
>>>> I have a question about comparing the fit of GLMM with different link
>> functions/families.
>>>>
>>>> For instance, can the deviance or the AIC be used to compare the fit of
>> probit and logit with the same parametrization?
>>>>
>>>> probit_model <- glmer(Y ~ A + B + C*D + (A | subjects), data = data,
>>>> family = binomial(link = "probit")) logit_model <- glmer(Y ~ A + B +
>>>> C*D + (A | subjects), data = data, family = binomial(link = "logit"))
>>>
>>> This is a surprisingly tough question, in my opinion. Neither the AIC
>>> nor the deviance depend on the link itself, so in theory, you could
>>> compare them ... but these models are not nested, and comparing
>>> non-nested models is generally a tricky problem.  That said, probit
>>> and logit models will tend to give very similar results in terms of
>>> predictions/fit to the data. The bigger difference is how you
>>> interpret coefficients, so I would chose between probit and logit
>>> based on desired interpretation.
>>>
>>> For other families/links, the comparison can get even more difficult.
>>> For example, if you compare an inverse link with an identity link,
>>> then you are comparing two very different albeit related quantities --
>>> like comparing a model of "speed" vs "time".
>>>
>>>>
>>>>
>>>> And is this also possible when the distributional assumptions are
>> different? For instance:
>>>>
>>>> gamma_model <- glmer(X ~ A + B + C*D + (A | subjects), data = data,
>>>> family = Gamma(link = "inverse")) inverse_gauss <- glmer(X ~ A + B +
>>>> C*D + (A | subjects), data = data, family = inverse.gaussian(link =
>>>> "1/mu^2"))
>>>
>>> Not really, no. Both the deviance and the AIC are functions of the log
>>> likelihood and the choice of family corresponds to a choice of
>>> likelihood, so you're comparing different things.
>>>
>>> Depending on what you're going for, looking at predictive power of the
>>> models directly -- such as looking at mean squared or mean absolute
>>> error computed with cross validation -- might work.
>>>
>>> That said, the choice of family is a statement about your assumptions
>>> and prior beliefs about the data. In a Bayesian context, McElreath has
>>> described this as a "prior about the data" in Statistical Rethinking.
>>> Gelman et al have also noted that the prior can only be understood in
>>> the context of the likelihood -- all hinting at the core idea here,
>>> namely that the family is an assumption about the conditional
>>> distribution of your data (or equivalently, about the the distribution
>>> of the error/noise in your data).
>>>
>>> My previous point about the choice of link changing interpretation
>>> also holds for changes in link accompanying changes in family -- the
>>> statements you can make about your data based on an inverse link vs an
>>> inverse square link are different.
>>>
>>> I would be happy to hear other opinions here.
>>>
>>> Hope that helps,
>>> Phillip
>>>
>>>>
>>>> Thank you!
>>>> Dries Debeer
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>>
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta
>>>> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-
>> models&data=04%7C01%
>>>>
>> 7CDries.Debeer%40ugent.be%7Cc4b9f930b96b4395259808d9e070fd71%7Cd7
>> 811c
>>>>
>> deecef496c8f91a1786241b99c%7C1%7C0%7C637787598905000528%7CUnkno
>> wn%7CT
>>>>
>> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
>> JXVC
>>>>
>> I6Mn0%3D%7C3000&sdata=TMlKe3iOp2AqihyZIUF31KM7Tk%2B%2FkFq
>> tL6Xbb1D
>>>> dVl4%3D&reserved=0
>>>>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-
>> models&data=04%7C01%7C
>>>
>> Dries.Debeer%40ugent.be%7Cc4b9f930b96b4395259808d9e070fd71%7Cd781
>> 1cdee
>>>
>> cef496c8f91a1786241b99c%7C1%7C0%7C637787598905000528%7CUnknown
>> %7CTWFpb
>>>
>> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>> 6Mn0
>>>
>> %3D%7C3000&sdata=TMlKe3iOp2AqihyZIUF31KM7Tk%2B%2FkFqtL6Xb
>> b1DdVl4%3
>>> D&reserved=0
>>>
>>
>> --
>> Dr. Benjamin Bolker
>> Professor, Mathematics & Statistics and Biology, McMaster University
>> Director, School of Computational Science and Engineering
>> (Acting) Graduate chair, Mathematics & Statistics
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.e
>> thz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-
>> models&data=04%7C01%7CDries.Debeer%40ugent.be%7Cc4b9f930b96
>> b4395259808d9e070fd71%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0
>> %7C637787598905000528%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&s
>> data=TMlKe3iOp2AqihyZIUF31KM7Tk%2B%2FkFqtL6Xbb1DdVl4%3D&re
>> served=0
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>