[R] Variance component estimation in glmmPQL

Tue Nov 27 15:43:11 CET 2012

Hi all,

I've been attempting to fit a logistic glmm using glmmPQL in order to
estimate variance components for a score test, where the model is of the
form logit(mu) = X*a+ Z1*b1 + Z2*b2.  Z1 and Z2 are actually reduced rank
square root matrices of the assumed covariance structure (up to a constant)
of random effects c1 and c2, respectively, such that b1 ~ N(0,sig.1^2*I) and
c1 ~ N(0,sig.1^2*K1) , where K1 = Z1*t(Z1), and  c1 = Z1*b1.

The model form I've been using is just the following:

m<-glmmPQL(y~1,random=list(f=pdBlocked(list(pdIdnot(~Z.1-1),pdIdnot(~Z.2-1))))
    ,family=binomial(link="logit"))

I've been extracting the variance components using VarCorr(), but I've
noticed that the reported variances associated with my random effects are
not even close to the values I get if I evaluate their variances empirically
(eg var(random.effects(m.12)).  I know that's not how they're actually
estimated, but there may be a whole order of magnitude difference in the
values.

Below is an example under R 2.14 on a linux machine:

library(MASS)
library(mgcv)
library(boot)

set.seed(1234)

G.1<-matrix(rnorm(5000,0,0.25),nrow=100)
G.2<-matrix(rnorm(5000,0,0.25),nrow=100)

K.1<-G.1%*%t(G.1)
K.2<-G.2%*%t(G.2)

Z.1<-mroot(K.1,method="svd")
Z.2<-mroot(K.2,method="svd")

b.1<-matrix(rnorm(ncol(Z.1),0,0.25),ncol=1)
b.2<-matrix(rnorm(ncol(Z.2),0,0.50),ncol=1)

p<-inv.logit(Z.1%*%b.1+Z.2%*%b.2)

y<-rbinom(100,1,p)
f<-rep(1,100)

m.fit<-glmmPQL(y~1,random=list(f=pdBlocked(list(pdIdent(~Z.1-1),pdIdent(~Z.2-1)))),
  family=binomial(link="logit"))

VarCorr(m.fit)
var(as.numeric(random.effects(m.fit))[1:ncol(Z.1)])
var(as.numeric(random.effects(m.fit))[-c(1:ncol(Z.1))])

>From the above, VarCorr() results in variance component estimates of sig.1^2
= 0.444 and sig.2^2  = 0.2778, whereas the empirical estimates are sig.1^2 =
0.2060 and sig.1^2 = 0.097.  I know variance component estimation in general
is a little shaky, but my simulations suggest that VarCorr() is extracting
values that are way too large on a consistent basis.

I'm largely assuming I'm misinterpreting something here, just can't figure
out what.

-Nick

--
View this message in context: http://r.789695.n4.nabble.com/Variance-component-estimation-in-glmmPQL-tp4650964.html
Sent from the R help mailing list archive at Nabble.com.