[R-sig-ME] compare fit of GLMM with different link/family

Wed Feb 2 18:36:18 CET 2022

Ben Bolker writes:
 >    Getting back to this late.

Thanks for getting back to it.

 > I think this all basically makes sense.  I would phrase it as
 > saying that what we are doing when we calculate the
 > "(log)likelihood" of a *continuous* response is in practice
 > calculating a (log) likelihood *density* (that's why the value can
 > be >1); as Phillip suggests, if we write it out as a likelihood
 > then there is an implicit 'delta-x' in the expression that makes it
 > a probability.  When we take the log that turns into an additive
 > constant, and we know that we can drop additive constants without
 > affecting the inferential machinery.

Do you include AIC as part of inferential machinery?  
To me AIC makes sense if loglik is really log of a probability 
(which would make it a measure of information).
And the probability depends on these deltas.  Which are not 
necessarily the same for all values of the response variable.
I can see the AIC reported as a sort of "AIC density" which
has to be multiplied by the size of a neighborhood representing
the intervals for all the outputs.

 > Put another way, as long as our implicit dx is the *same*
 > throughout our equations, we can ignore it.

Equations?  I think it's ok, at least for this argument, to 
imagine that the independent variables are exact and that the 
model is exact, but if you want to compute a probability of the
output (dependent variable) then considering the output values 
exact would mean a probability of zero for any continuous 
distribution.  The probability of the output makes sense if
the outputs are all changed to ranges reflecting measurement
error (and for that matter the information content of the
outputs).  If all of the values of the dependent variable have 
the same delta (which would be unusual if the measurements 
cover several orders of magnitude), and assuming that delta
is small enough so that the PDF doesn't change much over
any of those ranges then the probability would be
 your "density" likelihood function * (delta ^ #datapoints).
Your loglik density could be viewed as the probability if all 
of the deltas = 1, and when loglik>0 then clearly the pdf 
changes a lot over that delta.

So you're assuming several conditions that are not always met.
Or making several approximations that can sometimes be good and 
other times be bad.  (I guess you'll say, yes, and a lot more
assumptions/approximations besides!)

Especially when the output values range over several orders of
magnitude it seems unlikely that they represent intervals of
the same size and much more likely that the logs represent
intervals of similar sizes.  
In any case, the intervals can't be the same sizes for both 
the original outputs and their logs.

It occurs to me that your approximation of just measuring the 
pdf at one point assumes the same value for the entire interval, 
which makes all of the deltas independent - the formula above 
could be claimed to account for different deltas by changing
delta ^ #datapoints with the product of the deltas.  
But really each interval should be evaluated on the CDF to 
account for the fact that the PDF changes over the interval.

Do you agree that the probabilities I would compute with 
ranges of outputs would be comparable to those I'd compute
with log transformed ranges in another model?  I had the
impression before that this was the same correction as the 
one you've referred to (not quite described).  But it now
occurs to me that your correction may only be correcting for
the size of the intervals, and not also for the changes in
the PDF over the intervals.  

Where can I find more details on the correction for 
transformation of the output variable?

 >  The other complication is that the likelihood of a mixed model
 > *does* involve an integral (but it's an integral over the random
 > effects, and doesn't come into the argument above).

I think I understand that random effects involve an integral,
but I don't yet see the complication that introduces.
Perhaps it's related to how that integral is evaluated?