[R-sig-ME] compare fit of GLMM with different link/family

Thu Jan 27 06:58:17 CET 2022

(resending without a digital signature)

@Don: I think the part you're missing is that the likelihood depends on
the data and if you transform the data (e.g. via log), then you've
changed the data and now have a different likelihood. A little bit more
precisely: the likelihood of the model is the probability of the
parameters _conditional_ on the data.[*] For linear transformations of
the data, everything is fine, but for nonlinear transformations, you
need to take into account the distortion they introduce on the parameter
space, which is what the Jacobian does. Digging down a bit deeper, the
likelihood is ultimately an integral and any transformation of the data
corresponds to a change of variables in that integral. For nonlinear
transformations, that means you now have a Jacobian to deal with. (For
linear transformations, you can still be off by a multiplicative
constant, but that doesn't matter for finding the location of the
optimum, i.e. the parameters corresponding to the maximum likelihood.)

The "exact measurements" issue is, in the usual interpretation,
typically taken to be handled by treating the response as a distribution
and the inexactness there is part of the probability. There are some
types of models that can take estimated uncertainties around the
response into account -- this is more uncommon in the physical sciences
where measurements instruments have a well-calibrated uncertainty. But
this does come up in things like meta-analysis, where the response is
actually previous estimates and associated uncertainty. (For the
predictor variables, erorrs-within-measurement is a very different
story, at least on the frequentist side of things.) Your proposal of
changing everything to be a distribution corresponds well to the
Bayesian idea that basically everything is a random variable in the
technical sense, and you can chose many levels down you to model that. 

My apologies to the math stats crowd; I know I've been not particularly
rigorous, but I was hoping to convey the general intuition.

[*] There's some fine print here if we're talking about the likelihood
as a score function or a conditional probability, but the fine print
doesn't matter for the argument at hand.

On 1/26/22 18:12, Ben Bolker wrote:
>
>
> On 1/26/22 12:15 PM, Don Cohen wrote:
>>
>> not sure this should be sent to r-sig-mixed-models using r-project.org,
>> feel free to post it there if so
>>
>>   >    I would say that in general it's OK to compare models with
>> different
>>   > links, families, etc. via AIC *as long as you don't explicitly
>> transform
>>   > the response variable* -- i.e. you have to be careful comparing
>>   >
>>   >   lm(log(y) ~ ....)
>>   >
>>   > with
>>   >
>>   >    lm(y ~ ...)
>>   >
>>   > (you need a Jacobian term in the AIC expression to account for the
>>   > change in scaling of the density), but comparing basically
>>
>> This doesn't make any sense to me.
>> There are only two parts to AIC, log liklihood and a parameter
>> correction.  What does this transform have to do with either?
>> If you get a better loglik for the transformed version I'd just
>> say that model fits the data better.
>> (Whereas the parameter correction has to do with what you think
>> makes a model better outside of fit to data, and is more subjective.)
>>
>> Actually I do have a complaint about loglik -- I think it would
>> be fixed by really computing the probability of response given
>> inputs, and that this could be done pretty easily by simply
>> admitting that the responses are not exact measurements, and
>> changing them to ranges or at worst distributions.  Could THAT
>> be related to the transform problem?  If so then this seems like
>> a solution.
>>
>> Perhaps you can give me a reference to what I'm missing.
>> I also don't see what nested models have to do with this.
>