[R-sig-ME] compare fit of GLMM with different link/family

Wed Feb 2 03:05:27 CET 2022

   Getting back to this late.

On 1/27/22 4:46 PM, Don Cohen wrote:
> Phillip Alday writes:
> 
>   > @Don: I think the part you're missing is that the likelihood
>   > depends on the data and if you transform the data (e.g. via log),
>   > then you've changed the data and now have a different likelihood.
> 
> I'm not sure what you mean by changing the data, but the fact that
> you change the likelihood seems to be just as true for any other
> change to the model.
>   log(output) ~ input
> and
>   output ~ input
> are two different models just like they're both different from
>   output ~ input^2
> 
>   > precisely: the likelihood of the model is the probability of the
>   > parameters _conditional_ on the data.[*]
> 
> [I assume by parameters you mean what I call the output (dependent variable)
> and by the data you mean what I call the inputs - the independent variables)]
> But this gets back to my argument below that the likelihood is not really
> the same as probability...
> 
>   > For linear transformations of the data, everything is fine,
> 
> But my example above with input^2 was not a linear transformation of the
> data, was it?  You don't think it's fair to compare loglik of
>   output ~ input  with that of  ouput ~ input^2  ?
> Oh, I guess not - that's your argument about nested models.
> But I also don't understand that.

   I think Phillip meant "transform the *response variable*" specifically.

> 
> It seems to me that conditional probability of output given model and
> input is a measure of how well the output fits the input+model and it
> makes sense even to compare that even for different combinations of
> input, output, model.  I see that more rows of data will inevitably
> reduce that probability, so perhaps a good measure would be to divide
> log of prob by #rows, i.e., average log of probability per row.
> 
>   > but for nonlinear transformations, you need to take into account
>   > the distortion they introduce on the parameter space, which is what
>   > the Jacobian does. Digging down a bit deeper, the likelihood is
>   > ultimately an integral and any transformation of the data
> 
> I thought the likelihood was computed by just evaluating the PDF.
> Is that necessarily an integral ?  Is that related to your
> description of treating the response as a distribution?

> 
> What you write above does not convey to me exactly what problem is
> being solved or how it's being solved, but I get the feeling that your
> transformation might be the same thing I was complaining about.
> See what you think:
> 
> My complaint is illustrated by the fact that the loglik can be
> positive - because the pdf can be > 1.  Whereas the actual probability
> could be computed by changing the output value to a range and taking
> the difference between the values of the cdf at the two ends of the
> range (maybe you'd call that integration).  If you did that, say, for
> an output of 1.23, which I'd require you to change to an interval, say
> [1.225 - 1.235], then in order to compare the REAL probability (rather
> than the likelihood) of this model to that of another model using
> log(output), the interval would become [log(1.225) - log(1.235)],
> right?  Does that seem to correspond to your correction?
> 
>   > (For linear transformations, you can still be off by a
>   > multiplicative constant, but that doesn't matter for finding the
>   > location of the optimum, i.e. the parameters corresponding to the
>   > maximum likelihood.)
> 
> Again I might not be following you, but I think this may be related to
> the fact that loglik can be positive -- which means to me that even
> though you've found the optimal estimates, your loglik is NOT a
> reasonable estimate of the PROBABILITY of the output given the input +
> model.  And for model comparison I would want the log of the
> probability, not something that could be off by some (arbitrarily
> large) constant that might be different for different models.
> 
> So if loglik is computed as I think it is, then it's questionable
> whether it can be compared between different models at all, whereas
> if log prob were computed as I describe, then it would make sense to
> compare it for any two models, even if the output were transformed.
> 
> I hope that makes sense?
> 
> Or, of course, tell me where I've gone wrong.
> 

   I think this all basically makes sense.  I would phrase it as saying 
that what we are doing when we calculate the "(log)likelihood" of a 
*continuous* response is in practice calculating a (log) likelihood 
*density* (that's why the value can be >1); as Phillip suggests, if we 
write it out as a likelihood then there is an implicit 'delta-x' in the 
expression that makes it a probability.  When we take the log that turns 
into an additive constant, and we know that we can drop additive 
constants without affecting the inferential machinery.
    Put another way, as long as our implicit dx is the *same* throughout 
our equations, we can ignore it.

    The other complication is that the likelihood of a mixed model 
*does* involve an integral (but it's an integral over the random 
effects, and doesn't come into the argument above).

   Hope that helps.

   Ben Bolker