[R-sig-ME] compare fit of GLMM with different link/family

Thu Jan 27 22:46:08 CET 2022

Phillip Alday writes:

 > @Don: I think the part you're missing is that the likelihood
 > depends on the data and if you transform the data (e.g. via log),
 > then you've changed the data and now have a different likelihood.

I'm not sure what you mean by changing the data, but the fact that
you change the likelihood seems to be just as true for any other 
change to the model.
 log(output) ~ input
and
 output ~ input
are two different models just like they're both different from
 output ~ input^2

 > precisely: the likelihood of the model is the probability of the
 > parameters _conditional_ on the data.[*]

[I assume by parameters you mean what I call the output (dependent variable)
and by the data you mean what I call the inputs - the independent variables)]
But this gets back to my argument below that the likelihood is not really
the same as probability...

 > For linear transformations of the data, everything is fine,

But my example above with input^2 was not a linear transformation of the 
data, was it?  You don't think it's fair to compare loglik of
 output ~ input  with that of  ouput ~ input^2  ?
Oh, I guess not - that's your argument about nested models.
But I also don't understand that.

It seems to me that conditional probability of output given model and
input is a measure of how well the output fits the input+model and it
makes sense even to compare that even for different combinations of
input, output, model.  I see that more rows of data will inevitably
reduce that probability, so perhaps a good measure would be to divide
log of prob by #rows, i.e., average log of probability per row.

 > but for nonlinear transformations, you need to take into account
 > the distortion they introduce on the parameter space, which is what
 > the Jacobian does. Digging down a bit deeper, the likelihood is
 > ultimately an integral and any transformation of the data

I thought the likelihood was computed by just evaluating the PDF.
Is that necessarily an integral ?  Is that related to your 
description of treating the response as a distribution?

What you write above does not convey to me exactly what problem is
being solved or how it's being solved, but I get the feeling that your
transformation might be the same thing I was complaining about.
See what you think:

My complaint is illustrated by the fact that the loglik can be
positive - because the pdf can be > 1.  Whereas the actual probability
could be computed by changing the output value to a range and taking
the difference between the values of the cdf at the two ends of the
range (maybe you'd call that integration).  If you did that, say, for
an output of 1.23, which I'd require you to change to an interval, say
[1.225 - 1.235], then in order to compare the REAL probability (rather
than the likelihood) of this model to that of another model using
log(output), the interval would become [log(1.225) - log(1.235)],
right?  Does that seem to correspond to your correction?

 > (For linear transformations, you can still be off by a
 > multiplicative constant, but that doesn't matter for finding the
 > location of the optimum, i.e. the parameters corresponding to the
 > maximum likelihood.)

Again I might not be following you, but I think this may be related to
the fact that loglik can be positive -- which means to me that even
though you've found the optimal estimates, your loglik is NOT a
reasonable estimate of the PROBABILITY of the output given the input +
model.  And for model comparison I would want the log of the
probability, not something that could be off by some (arbitrarily
large) constant that might be different for different models.

So if loglik is computed as I think it is, then it's questionable
whether it can be compared between different models at all, whereas
if log prob were computed as I describe, then it would make sense to
compare it for any two models, even if the output were transformed.

I hope that makes sense?

Or, of course, tell me where I've gone wrong.