[R-sig-ME] Log likelihood of a glmer() binomial model .
Rolf Turner
r@turner @end|ng |rom @uck|@nd@@c@nz
Sat Apr 20 01:22:28 CEST 2019
I am trying to implement cross-validated likelihood (see e.g. "Model
selection for probabilistic clustering using cross-validated
likelihood", P. Smyth, 2000, Statistics and Computing vol. 10, pp.
63--72) for model selection in the glmer() binomial family context.
Briefly what I do is:
* divide the data into a "training set" and a "validation set"
(e.g. 80% and 20%)
* fit the model of interest to the training set *only*
* calculate the log-likelihood of the validation set on the
basis of the model fitted to the training set
It is the last step about which I have some concern. I calculate
this log likelihood as
sum(log(predict(fit,newdata=VS,type="response")))
where "fit" is the model fitted to the training set and "VS" is the
validation set.
I have the uneasy feeling that I may well be doing something stupidly
naïve here, but I can't see anything obviously wrong with what I am doing.
I have observed that, if I execute
sum(log(predict(fit,type="response")))
ostensibly calculating the log likelihood of "fit" for the data set from
which "fit" was obtained, I get a very different value from that which
is obtained from executing logLik(fit). This does not *necessarily*
imply, however, that my method is wrong, since the log likelihood is
"unique only up to an additive constant". I cannot however see how to
work out what this additive constant might be equal to, so that I can check.
Can anyone enlighten me as to whether my log likelihood is "correct"?
And if not, could someone suggest a *correct* means of calculating a
"cross-validated" log likelihood?
Thanks.
cheers,
Rolf Turner
--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
More information about the R-sig-mixed-models
mailing list