[R] GLMM(..., family=binomial(link="cloglog"))?

Tue Jun 8 16:59:58 CEST 2004

Spencer Graves <spencer.graves at pdf.com> writes:

> 	  Another GLMM/glmm problem:  I simulate rbinom(N, 100, pz),
> 	  where logit(pz) = rnorm(N).  I'd like to estimate the mean
> 	  and standard deviation of logit(pz).  I've tried GLMM{lme4},
> 	  glmmPQL{MASS}, and glmm{Jim Lindsey's repeated}.  In several
> 	  replicates of this for N = 10, 100, 500, etc., my glmm call
> 	  produced estimates of the standard deviation of the random
> 	  effect in the range of 0.6 to 0.8 (never as high as the 1
> 	  simulated).  Meanwhile, my calls to GLMM produced estimates
> 	  between 1e-12 and 1e-9, while the glmmPQL results tended to
> 	  be closer to 0.001, though it gave one number as high as
> 	  0.7.  (I'm running R 1.9.1 alpha, lme4 0.6-1 under Windows
> 	  2000)
> 
> 
> 	  Am I doing something wrong, or do these results suggest bugs
> 	  in the software or deficiencies in the theory or ... ?
> 
> 
> 	  Consider the following:
> 
>  > set.seed(1); N <- 10
>  > z <- rnorm(N)
>  > pz <- inv.logit(z)
>  > DF <- data.frame(z=z, pz=pz, y=rbinom(N, 100, pz)/100, n=100,
>  > smpl=factor(1:N))
> 
>  > GLMM(y~1, family=binomial, data=DF, random=~1|smpl, weights=n)
> Generalized Linear Mixed Model

Check the observed proportions in the data and see if they apparently
vary enough to be able to expect to estimate a random effect.

It is entirely possible to have the MLE of a variance component be
zero.

Another thing to do it to check the convergence.  Use

GLMM(y ~ 1, family = binomial, random = ~1|smpl, weigths = n, 
     control = list(EMv=TRUE, msV=TRUE))

or 

GLMM(y ~ 1, family = binomial, random = ~1|smpl, weigths = n, 
     control = list(EMv=TRUE, msV=TRUE, opt = 'optim'))

You will see that both optimizers push the precision of the random
effects to very large values (i.e. the variance going to zero) in the
second of the penalized least squares steps.

I think that this is a legitimate optimum for the approximate
problem.  It may be an indication that the approximate problem is not
the best one to use.  As George Box would tell us,

 You have a big approximation and a small approximation.  The big
 approximation is that your approximation to the problem you want to
 solve.  The small approximation is involved in getting the solution
 to the approximate problem.

For this case, even if I turn off the PQL iterations and go directly
to the Laplacian approximation I still get a near-zero estimate of the
variance component.  You can see the gory details with

GLMM(y ~ 1, family = binomial, random = ~1|smpl, weigths = n,
     control = list(EMv=TRUE, msV=TRUE, glmmMaxIter = 1), method = 'Laplace')

I am beginning to suspect that for these data the MLE of the variance
component is zero.

> 	  So far, the best tool I've got for this problem is a normal
> 	  probability plot of a transform of the binomial responses
> 	  with Monte Carlo confidence bands, as suggested by Venables
> 	  and Ripley, S Programming and Atkinson (1985).  However, I
> 	  ultimately need to estimate these numbers.

I think the most reliable method of fitting this particular form of a
GLMM is using adaptive Gauss-Hermite quadrature to evaluate the
marginal likelihood and optimizing that directly.  In this model the
marginal likelihood is a function of two parameters.  If you have
access to SAS I would try to fit these data with PROC NLMIXED and see
what that does.  You may also be able to use Goran Brostrom's package
for R on this model.  As a third option you could set up evaluation of
the marginal likelihood using either the Laplacian approximation to
the integral or your own version of adaptive Gauss-Hermite and look at
the contours of the marginal log-likelihood.