[R-sig-ME] Constraining error variance to 0 in a mixed model.

Fri Jan 11 03:30:23 CET 2019

Let me start off by apologising for beating this issue to death on this 
mailing list.  I have raised it (without getting satisfactory answers) 
on several occasions.  If you are fed up to the teeth with my maundering 
s, please just press the delete button.

This latest occasion was triggered by my trying to learn to use the 
glmmTMB function (from the package of the same name).  I noticed (to my 
initial delight) that glmmTMB() has an argument "dispformula" and the 
help says (in particular) "In Gaussian mixed models, dispformula=~0 
fixes the [dispersion] parameter to be 0, forcing variance into the 
random effects."

I said "Ah-ha!  The very thing."  So I tried it out, using the following 
data set (supplied by Ben Pelzer in a posting to this list, and 
previously used by my very good self to illustrate the problem that I am 
trying to solve):

Dat <- structure(list(person = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L,
9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L,
13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L,
18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L), occasion = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), test = c(25L, 21L, 27L, 27L, 19L,
23L, 20L, 18L, 18L, 26L, 33L, 46L, 25L, 36L, 47L, 36L, 35L, 41L,
30L, 30L, 37L, 23L, 21L, 19L, 39L, 37L, 33L, 29L, 36L, 49L, 29L,
34L, 44L, 39L, 32L, 30L, 26L, 32L, 36L, 33L, 21L, 8L, 30L, 34L,
34L, 27L, 34L, 39L, 41L, 40L, 44L, 37L, 34L, 34L, 23L, 26L, 30L,
31L, 28L, 27L)), row.names = c(NA, -60L), class = "data.frame")

I tried three ways of "analysing" these data:

(1) Simple-minded multivariate analysis:

X   <- matrix(Dat$test,byrow=TRUE,ncol=3)
colnames(X) <- paste0("occasion",1:3)
mu  <- apply(X,2,mean)
Sig <- var(X)/20 # X has 20 rows

(2) Using lmer():
library(lme4)
fit2 <- lmer(test ~ 0 + occasion + (0 + occasion | person),data=Dat,
              control=lmerControl(check.nobs.vs.nRE = "ignore"))

(The "control" argument was gathered from a posting by Maarten Jung, on 
which I have previously commented.)

Note that this "works" but throws (as I have previously noted)
a disconcerting warning message:

> Warning message:
> In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>   Model is nearly unidentifiable: large eigenvalue ratio
>  - Rescale variables?

(3) Using glmmTMB():
library(glmmTMB)
fit3 <- glmmTMB(test ~ 0 + occasion + (0 + occasion | person),data=Dat,
                 dispformula = ~0)

(No warning message; initially I said "Wheeeee!!!")

The "parameter estimates" mu, fixef(fit2) and fixef(fit3) all agree:

occasion1 occasion2 occasion3
     29.80     30.05     33.30

The covariance Sig and vcov(fit2) agree modulo numerical noise:

 > Sig
           occasion1 occasion2 occasion3
occasion1 1.7821053  1.150526 0.5768421
occasion2 1.1505263  2.249868 3.0623684
occasion3 0.5768421  3.062368 5.8531579

 > vcov(fit2)
3 x 3 Matrix of class "dpoMatrix"
           occasion1 occasion2 occasion3
occasion1 1.7821054  1.150526 0.5768422
occasion2 1.1505265  2.249868 3.0623683
occasion3 0.5768422  3.062368 5.8531577

However vcov(fit3) differs; after some head scratching I realised that 
it is equal to 19/20 times the others.  After a little more head 
scratching I said ah-ha!  The glmmTMB() function sets REML=FALSE by
default.  So I'll just call it with REML=TRUE:

fit3 <- glmmTMB(test ~ 0+occasion + (0+occasion | person),data=Dat,
                 dispformula = ~0,REML=TRUE)

And now vcov(fit3) more-or-less agrees with the other results:

 > vcov(fit3)
Conditional model:
           occasion1 occasion2 occasion3
occasion1 1.7819337  1.150276 0.5765386
occasion2 1.1502758  2.249755 3.0623895
occasion3 0.5765386  3.062390 5.8534164

But, to my extreme irritation, glmmTMB() now also throws a warning:

> Warning message:
> In fitTMB(TMBStruc) :
> Model convergence problem; false convergence (8). See vignette('troubleshooting')

(I've looked at the vignette, and I can sort of see a connection, but 
not really.)

*Why* does the bloody universe always do this sort of thing to me?  Why 
on earth should dividing by 19 rather than 20, effectively, cause such a 
warning to be thrown?

Isn't something a little out of whack here?

I reiterate at this point a remark that I made in a previous post on the 
issue of constraining the error variance to 0:

> What we have here is a "boundary" or "edge" or "corner" case, and I
> have heard it asserted by someone knowledgeable and respected (can't
> remember who, it was a long while ago) that in testing your software it
> is crucial to see how it behaves with such "boundary" cases.

Am I being unreasonable?  (*Me*?  *Unreasonable*?  Perish the thought!) :-)

cheers,

Rolf Turner

P.S.  Note that setting REML=FALSE in the call to lmer() causes it to
produce a covariance matrix that agrees with that produced by glmmTMB()
with default setting.  However this has no impact upon the error message 
issued by lmer().

R. T.

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276