[R-sig-ME] Log likelihood of a glmer() binomial model .

Sat Apr 20 01:22:28 CEST 2019

I am trying to implement cross-validated likelihood (see e.g. "Model 
selection for probabilistic clustering using cross-validated 
likelihood", P. Smyth, 2000, Statistics and Computing vol. 10, pp. 
63--72) for model selection in the glmer() binomial family context.

Briefly what I do is:

    * divide the data into a "training set" and a "validation set"
      (e.g. 80% and 20%)
    * fit the model of interest to the training set *only*
    * calculate the log-likelihood of the validation set on the
      basis of the model fitted to the training set

It is the last step about which I have some concern.  I calculate
this log likelihood as

     sum(log(predict(fit,newdata=VS,type="response")))

where "fit" is the model fitted to the training set and "VS" is the 
validation set.

I have the uneasy feeling that I may well be doing something stupidly 
naïve here, but I can't see anything obviously wrong with what I am doing.

I have observed that, if I execute

     sum(log(predict(fit,type="response")))

ostensibly calculating the log likelihood of "fit" for the data set from 
which "fit" was obtained, I get a very different value from that which 
is obtained from executing logLik(fit).   This does not *necessarily* 
imply, however, that my method is wrong, since the log likelihood is 
"unique only up to an additive constant".  I cannot however see how to 
work out what this additive constant might be equal to, so that I can check.

Can anyone enlighten me as to whether my log likelihood is "correct"?

And if not, could someone suggest a *correct* means of calculating a 
"cross-validated" log likelihood?

Thanks.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276