[R-sig-ME] How to compute deviance of a (g)lmer model under different values of the parameters ? Partial approximate answer

Tue May 5 23:02:24 CEST 2009

Answering to myself, at least for archive users' sake :

I have a partial approximate solution, which seems to work (= gives "not
seemingly unreasonable answers") for *linear* mixed effect models.

For these models, the deviance (= -2 logLik(themodel)) is easily
computed as the sum on all observations of the squared residuals. This
residual r is the difference between the observed value y and the
estimate by the model y.hat.

If, as usual, we denote the fixed effects model matrix by X, beta the
unknown fixed effects coefficients, with estimator beta.hat, Z the
random effects model matrix and b the unknown random effects
coefficients with estimator b.hat, we all know that

y.hat = X%*%beta.hat + Z%*%b.hat (1)

Therefore, r = y-y.hat = y-y.hat = y-(X%*%beta.hat + Z%*%b.hat). (2)

The problem is that Z%*%b.hat might be *quite* complicated to rebuilt
from ranef(model) and Z.

Since we seek a solution allowing to compute the LL of the model under a
new estimator of beta (say beta.hat.new) *computed while ignoring the
random effects beyond what is reflected in beta.hat variances*, it seems
reasonable to ignore what modifications to Z%*%b.hat an adjustment of
the model to the new fixed effect estimators beta.hat.new would entail,
and use (2) with the original values to compute the new "residuals". To
be precise :

a) Zbh=y.hat-X%*%beta.hat (computed from the original model)
b) r.new=y-(X%*%beta.hat.new+Zbh) (in the supposed new model)

which can be "onelined", of course... at the expense of clarity.
Computing deviance an logLik is trivial afterwards...

Two questions :
	a) What do you think ? Am I crazy like a fox ? (I wouldn't mind being
crazy like a Fox(*) ... :)
	b) Could such a trick be applied to *generalized* linear models ?
Computing the deviance from the residuals doesn't seem as easy as for
linear models...

Sincerely,

					Emmanuel Charpentier

(*) with apologies to John F, who has probably read|heard it a lot of
times... 

Le mardi 05 mai 2009 à 09:37 +0200, Emmanuel Charpentier a écrit :
> Dear list,
> 
> I'm trying to implement multiple imputations analysis of datasets with
> missing values for mixed models. My interest goes mainly to the fixed
> effects.
> 
> As you probably know, the principle of such an analysis is to impute
> "reasonable" values to missing data a few times (with "reasonable"
> variability), analyze these few completed datasets and pooling the
> analyses.
> 
> Schafer(1999) states simple methods for pooling estimates of the model
> and their variances. These methods can be applied to mixed models (with
> a few snags : you need to *ignore* the variation of random effects
> between imputed datasets beyond what is reflected in fixed effects' SE,
> you need to guesstimate the residuals DoF, which is something Douglas
> Bates has expressed grave doubts about). As it stands, these methods
> give results which, at first look, does not seem unreasonable, and might
> be used as a first approximation. More on this later.
> 
> My current goal is to build a function to build a "pooled test" for
> likelihood ratio of two models. Such a test has been proposed by Meng &
> Rubin (Biometrika, 1992), and, according to a presentation by RA
> Medeiros, has been implemented in Stata under the name milrtest.
> 
> I'm trying to implement such a pooled test in R. My snag is that the
> estimate is based on the estimation, for each imputed dataset, of the
> (log)likelihood ratio of each of these datasets under the hypotheses of
> the coefficients having precisely the values obtained in the coefficient
> pooling step.
> 
> So what I need is a (if possible elegant and/or fast) way to compute
> log-likelihood of a given dataset under a model (already built) under
> this hypothesis, up to a quantity supposed constant between models.
> 
> The logLik(model) function will happily give me LL(model|parameters
> giving makimum LL). What I need is something like logLik(model, coefs)
> giving me (up to an additive constant) LL(model|parameters=coefs).
> 
> Do you have any suggestions ? I can always sum squares of residuals
> recomputed under this alternative hypothesis, but I'm not quite sure
> that's enough for my purposes, especially if I plan to include the
> random effect estimates later...
> 
> 					Emmanuel Charpentier
>