[R-sig-ME] Repeated measures in lmer().

Thu Apr 2 03:47:07 CEST 2009

Yesterday I posted a query with subject line ``Puzzled by simple  
example.''

The puzzlement was really due to a clumsy stupidity on my part, about
which the less said the better. :-)

The example that I was (initially) puzzled by was really a preliminary
to getting to the real issue with which I am trying to come to grips.

This is the issue of fitting repeated measures models using lmer().

The model that was fitted in my yesterday's posting was

	lmer(y ~ age + (1 | child),data=ht.dat)

This assumes, as I far as I understand things (which is admittedly not
very far) a constant diagonal covariance matrix for the within-child
random effect.  This is a pretty unrealistic assumption.  Doug Bates
kindly informed me, many months ago, that to get a general positive
definite covariance matrix I should use the syntax

	lmer(y ~ age + (age | child),data=ht.dat)   # (*)

but that the model that lmer would be attempting to fit thereby would be
singular.

It has taken me this many months to get even a start at clearing away  
the
cobwebs from my thinking so that I could understand this, but I now  
think
I'm getting to square zero at least.

The model being fitted by (*) above is

	Y = X beta + Z_1 b_1 + Z_2 b_2 + E

but really the ``Z b'' terms (child effect + age-by-child interaction)
can be combined into a single term so the model becomes just

	Y = X beta + Z b + E

which is written entry by entry as

	y_ij = beta_i + B_ij + E_ij

where the E_ij are all i.i.d. N(0,sigma^2_E) and the B_ij are  
independent
of the E_ij, likewise Gaussian with mean 0, but each B_j =  
(B_1j,...,B_4j)'
has covariance matrix Sigma (4-x-4, positive definite, but with no other
specified structure).

The reason the model is singular is that you can ``blend'' as much or
as little as you choose of sigma^2_E into the diagonal of Sigma and have
the same model.  The miraculous thing is that despite the singularity,
lmer() gives the (or ``a'') correct answer, at least in the simulated
data experiments that I've tried.  E.g.:

#
# Script scr.02
#

library(MASS)
library(lme4)

# Generate the data.  No attempt is made to make these
# data realistic.
NCHILD <- 20
set.seed(42)
Beta <- c(40,50,60,70)
Sigma <- 2*matrix(c(1.00,0.50,0.25,0.10,
                     0.50,1.00,0.50,0.25,
                     0.25,0.50,1.00,0.50,
		    0.10,0.25,0.50,1.00),ncol=4)
MB <- mvrnorm(NCHILD,rep(0,4),Sigma)
B  <- as.vector(t(MB))
E  <- rnorm(4*NCHILD,0,0)
Y  <- Beta + B + E
ht.dat.2 <- data.frame(y=Y,age=factor(rep(4:7,NCHILD)),child=factor 
(rep(1:NCHILD,each=4)))

# Fit with lmer():
f2 <- lmer(y ~ 0 + age + (0+age|child),data=ht.dat.2,REML=TRUE)
V <- VarCorr(f2)
M <- V[[1]]
S <- attr(V,"sc")
covb.lmer <- M + diag(rep(S^2,4))
attributes(covb.lmer) <- attributes(covb.lmer)["dim"]
covb.chk <- var(MB)

# Clean up:
rm(NCHILD,Beta,Sigma,MB,B,E,Y,V,M,S)

You see that if you take the residual variance component and add it to
the diagonal of the ``child'' covariance matrix (``Sigma-hat'') you get
(very close to) exactly the right answer.  For NCHILD = 20 as in the
foregoing example we get:

 > range(covb.lmer-covb.chk)
[1] -1.288370e-04  1.467542e-05

Without even a warning.  If we jack NCHILD up to 500 we get a warning
about false convergence (well, I did yesterday, but when I sourced
scr.02 just now, I didn't --- ???) but the agreement is even better:

 > range(covb.lmer-covb.chk)
[1] -6.019627e-06  6.337365e-06

My question is:  Is this behaviour reliable?  Can I depend on this sort
of work-around for fitting a repeated measures model where there is
no replication, i.e. one observation per (age,child) combination, whence
no room for an E_ij term in the model?  Suppose the repeated measures
model is complicated by other factors.  E.g. instead of all girls we
have both sexes and put a sex factor into the model.  Or the children
are nested within localities or countries.  Can I still fit the model(s)
in this way?

``Ideally'' I'd like to be able to fit the model

	y_ij = beta_i + B_ij

i.e. suppress the residual error E_ij term (or lump it in with the
child random effect).   This is presumably not possible as lmer() is
currently constituted.  Would it/could it be possible?  I.e. could the
code be adjusted to accommodate this desideratum?  Would it be a
coding nightmare to do this?  It would seem to me to a fairly
*desirable* feature, since repeated measures data with no
replications are not at all uncommon.  But of course desirable and
feasible are two very different things.

On another tack:  Have I (at last) got hold of the right end of the
stick in respect of such models?  Or is my understanding still flawed?
Am I going at fitting repeated measures models in the right way?
Is there another --- better --- way to do this using lmer()?

I have tried a similar simulation experiment with two replicate  
observations
for each child.  (Height measured twice on each occasion; the E_ijk,  
k=1,2,
now representing measurement error.)  The answers appeared to make sense
although in these circumstances one cannot do an exact check since the
estimate of ``Sigma'' produced by lmer() will not be the same as var(MB)
*because of* the measurement error.

Comments?  Suggestions?  Enlightenment? Feedback?  Such from the horse's
mouth (i.e. Doug Bates) would be particularly welcome.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}