[R-sig-ME] Repeated measures in lmer().
Rolf Turner
r.turner at auckland.ac.nz
Thu Apr 2 03:47:07 CEST 2009
Yesterday I posted a query with subject line ``Puzzled by simple
example.''
The puzzlement was really due to a clumsy stupidity on my part, about
which the less said the better. :-)
The example that I was (initially) puzzled by was really a preliminary
to getting to the real issue with which I am trying to come to grips.
This is the issue of fitting repeated measures models using lmer().
The model that was fitted in my yesterday's posting was
lmer(y ~ age + (1 | child),data=ht.dat)
This assumes, as I far as I understand things (which is admittedly not
very far) a constant diagonal covariance matrix for the within-child
random effect. This is a pretty unrealistic assumption. Doug Bates
kindly informed me, many months ago, that to get a general positive
definite covariance matrix I should use the syntax
lmer(y ~ age + (age | child),data=ht.dat) # (*)
but that the model that lmer would be attempting to fit thereby would be
singular.
It has taken me this many months to get even a start at clearing away
the
cobwebs from my thinking so that I could understand this, but I now
think
I'm getting to square zero at least.
The model being fitted by (*) above is
Y = X beta + Z_1 b_1 + Z_2 b_2 + E
but really the ``Z b'' terms (child effect + age-by-child interaction)
can be combined into a single term so the model becomes just
Y = X beta + Z b + E
which is written entry by entry as
y_ij = beta_i + B_ij + E_ij
where the E_ij are all i.i.d. N(0,sigma^2_E) and the B_ij are
independent
of the E_ij, likewise Gaussian with mean 0, but each B_j =
(B_1j,...,B_4j)'
has covariance matrix Sigma (4-x-4, positive definite, but with no other
specified structure).
The reason the model is singular is that you can ``blend'' as much or
as little as you choose of sigma^2_E into the diagonal of Sigma and have
the same model. The miraculous thing is that despite the singularity,
lmer() gives the (or ``a'') correct answer, at least in the simulated
data experiments that I've tried. E.g.:
#
# Script scr.02
#
library(MASS)
library(lme4)
# Generate the data. No attempt is made to make these
# data realistic.
NCHILD <- 20
set.seed(42)
Beta <- c(40,50,60,70)
Sigma <- 2*matrix(c(1.00,0.50,0.25,0.10,
0.50,1.00,0.50,0.25,
0.25,0.50,1.00,0.50,
0.10,0.25,0.50,1.00),ncol=4)
MB <- mvrnorm(NCHILD,rep(0,4),Sigma)
B <- as.vector(t(MB))
E <- rnorm(4*NCHILD,0,0)
Y <- Beta + B + E
ht.dat.2 <- data.frame(y=Y,age=factor(rep(4:7,NCHILD)),child=factor
(rep(1:NCHILD,each=4)))
# Fit with lmer():
f2 <- lmer(y ~ 0 + age + (0+age|child),data=ht.dat.2,REML=TRUE)
V <- VarCorr(f2)
M <- V[[1]]
S <- attr(V,"sc")
covb.lmer <- M + diag(rep(S^2,4))
attributes(covb.lmer) <- attributes(covb.lmer)["dim"]
covb.chk <- var(MB)
# Clean up:
rm(NCHILD,Beta,Sigma,MB,B,E,Y,V,M,S)
You see that if you take the residual variance component and add it to
the diagonal of the ``child'' covariance matrix (``Sigma-hat'') you get
(very close to) exactly the right answer. For NCHILD = 20 as in the
foregoing example we get:
> range(covb.lmer-covb.chk)
[1] -1.288370e-04 1.467542e-05
Without even a warning. If we jack NCHILD up to 500 we get a warning
about false convergence (well, I did yesterday, but when I sourced
scr.02 just now, I didn't --- ???) but the agreement is even better:
> range(covb.lmer-covb.chk)
[1] -6.019627e-06 6.337365e-06
My question is: Is this behaviour reliable? Can I depend on this sort
of work-around for fitting a repeated measures model where there is
no replication, i.e. one observation per (age,child) combination, whence
no room for an E_ij term in the model? Suppose the repeated measures
model is complicated by other factors. E.g. instead of all girls we
have both sexes and put a sex factor into the model. Or the children
are nested within localities or countries. Can I still fit the model(s)
in this way?
``Ideally'' I'd like to be able to fit the model
y_ij = beta_i + B_ij
i.e. suppress the residual error E_ij term (or lump it in with the
child random effect). This is presumably not possible as lmer() is
currently constituted. Would it/could it be possible? I.e. could the
code be adjusted to accommodate this desideratum? Would it be a
coding nightmare to do this? It would seem to me to a fairly
*desirable* feature, since repeated measures data with no
replications are not at all uncommon. But of course desirable and
feasible are two very different things.
On another tack: Have I (at last) got hold of the right end of the
stick in respect of such models? Or is my understanding still flawed?
Am I going at fitting repeated measures models in the right way?
Is there another --- better --- way to do this using lmer()?
I have tried a similar simulation experiment with two replicate
observations
for each child. (Height measured twice on each occasion; the E_ijk,
k=1,2,
now representing measurement error.) The answers appeared to make sense
although in these circumstances one cannot do an exact check since the
estimate of ``Sigma'' produced by lmer() will not be the same as var(MB)
*because of* the measurement error.
Comments? Suggestions? Enlightenment? Feedback? Such from the horse's
mouth (i.e. Doug Bates) would be particularly welcome.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-sig-mixed-models
mailing list