[R-sig-ME] Log likelihood of a glmer() binomial model .

Sun Apr 21 18:27:29 CEST 2019

Rolf: I'm no Prof but a lowly grad student of an unrelated field, so take
all my input with a grain of salt.

As alluded to by Ben, predict() can certainly provide fitted probabilities
for the validation set with the random effects taken into account. This is
achieved by the re.form = NULL argument. However -- and I'll be happy to be
corrected on this -- the problem is that dbinom() will calculate a
(log)likelihood of the observed responses assuming a regular binomial PMF,
which does not apply in the case of a mixed-effects model. Thus the result
will not equal the loglikelihood that is maximized in the fitting process,
i.e. will not equal logLik(validationFit) unless it's a standard logistic
GLM.

Best,

J

su 21. huhtik. 2019 klo 2.03 Rolf Turner (r.turner using auckland.ac.nz)
kirjoitti:

>
> On 21/04/19 2:51 AM, Ben Bolker wrote:
>
> >    I agree with Juho that there's a typo -- would be something like.
> >
> >   sum((y==1)*log(pred_prob) + (y==0)*log(1-pred_prob))
>
> Whoops.  Not a typo as such but a failure (lapse) in my understanding.
> I was (stupidly) thinking that predict(...,type="response") would
> produce Pr(Y = y_i | predictors_i) but *of course* (!!!) it doesn't ---
> it produces Pr(success | predictors-i).  So I need to invoke dbinom(),
> as Prof. Ruohonen has pointed out.  I think I "really" knew that, but
> somehow my brain slipped a cog.
>
> Sorry for the confusion.
>
> That notwithstanding, (again as Prof. Ruohonen pointed out) this is
> still not right since predict() produces Pr(success | predictors_i, R =
> 0) whereas I need Pr(success | predictors_i) with the random effect R
> "integrated out".
>
> >    As far as what predict() does: depending on the value of re.form, it
> > either gives a population-level prediction (R=0) or a conditional
> > prediction (R set to its conditional mode).  You might be looking for
> > *marginal* predictions (which Rizopoulous's GLMMadaptive package can do
> ...)
>
> Oh dear.  I would then have to switch over from using lme4 to using
> GLMMadaptive ....  Maybe I'll have to climb that mountain.  Or maybe
> I'll go with the "slow" method of calculating cross-validated likelihood
> that Ben Bolker outlined in a previous email.
>
> cheers,
>
> Rolf
>
> > On 2019-04-20 3:42 a.m., Juho Kristian Ruohonen wrote:
> >> Rolf: Forgive my ignorance, but isn't the relevant log-likelihood here
> >> the log-likelihood of the observed responses in the validation set given
> >> the model-predicted probabilities? I.e. sum(dbinom(VS$y, size = VS$n,
> >> prob = predict(fit, newdata = VS, type = "response"), log = TRUE))? Even
> >> this would be somewhat off because dbinom() isn't aware of the
> >> random-effects integral business. But it looks to me like your current
> >> call is calculating the log-sum of the predicted probabilities of y = 1
> >> in the validation set, not the loglikelihood of the observed responses
> >> in the validation set.
> >>
> >> la 20. huhtik. 2019 klo 8.51 Rolf Turner (r.turner using auckland.ac.nz
> >> <mailto:r.turner using auckland.ac.nz>) kirjoitti:
> >>
> >>
> >>      On 20/04/19 12:44 PM, Ben Bolker wrote:
> >>
> >>      >    This seems wrong.
> >>
> >>      Yeah, that figures.
> >>
> >>      > The GLMM log-likelihood includes an integral over
> >>      > the distribution of the random effects.
> >>
> >>      I was aware of this.  I guess what I was naïvely expecting was that
> >>      predict.merMod() would handle this.  I.e. that this predict method
> >>      (with type = "response") would return, for each observed y_i in the
> >>      (new) data set
> >>
> >>         Pr(Y = y_i) = \int_0 Pr(Y = y_i | R = r) f(r) dr
> >>
> >>      where R is the vector of random effects and f(r) is its probability
> >>      density function (multivariate normal, with mean 0 and some
> covariance
> >>      matrix, which has been estimated in the fitting process.
> >>
> >>      I guess that this is *not* what predict.merMod() returns --- but I
> >>      don't
> >>      understand why not.  It seems to me that this is what it "should"
> >>      return.
> >>
> >>      I'm probably misunderstanding something, possibly simple, possibly
> >>      subtle.
> >>
> >>      Apropos of nothing much, what *does* predict.merMod() return?
> >>      Maybe Pr(Y = y_i | R = 0) ???
> >>
> >>      <SNIP>
> >>      >   Here is an **inefficient** method for computing the likelihood
> >>      >
> >>      >     coefs <- unlist(getME(fit,c("theta","beta"))
> >>      >     newdev <- update(fit, data=VS, devFunOnly=TRUE)
> >>      >     newdev(coefs)
> >>      >
> >>      > This is slow because it has to reconstruct all of the
> random-effects
> >>      > matrices, do permutations to reorder the relevant matrices to be
> as
> >>      > sparse as possible, etc. etc.
> >>
> >>      Thanks for this.  I'll give it a go.  I think that the slowness
> may not
> >>      be an overwhelming drawback.  Anyhow I shall try to test it out.
> >>
> >>      Thanks again.
> >>
> >>      cheers,
> >>
> >>      Rolf
> >>
> >>      --
> >>      Honorary Research Fellow
> >>      Department of Statistics
> >>      University of Auckland
> >>      Phone: +64-9-373-7599 ext. 88276
> >>
> >>      _______________________________________________
> >>      R-sig-mixed-models using r-project.org
> >>      <mailto:R-sig-mixed-models using r-project.org> mailing list
> >>      https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
>
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>

	[[alternative HTML version deleted]]