[R-sig-ME] Repeated measures in lmer().

Rolf Turner r.turner at auckland.ac.nz
Fri Apr 3 00:19:27 CEST 2009


On 3/04/2009, at 9:31 AM, Douglas Bates wrote:

	<My original post snipped out here.>

> I'm glad to see the word "mouth" in that sentence.  Occasionally I
> have been described with reference to another part of a horse's
> anatomy. :-)

	I can't imagine anyone being so ***rude***!  Especially
	not on ***this*** list! :-)

> I appreciate your difficulty in phrasing a model such as this for
> lmer.  In fact, I don't think one could reliably fit the model that
> you want to fit using lmer.

	That's what I was afraid of.  Tossing and turning on my bed
	of pain last night I realized that even though I could jigger
	the estimate of the within-child covariance matrix to get it
	right, any inferences based on such a fit would probably be
	all out to luntch.

> If I may be so bold, I would suggest that the fault is more in the
> model formulation than in the software.

	I cheerfully admit that the model is shonky.  But it was/is
	a *toy* model to illustrate the problem.  I shouldn't have
	used ``age'' as the fixed effect.  I cobbled together my
	toy example rather too quickly and with too little creativity.

> You are taking what may be termed a "classical" approach to repeated
> measures data, specifically longitudinal data.  You require that the
> data be expressible as a subjects by occasions table and essentially
> extracting means and covariances from the columns of that table.
> Sometime we refer to that organization is the wide format, as opposed
> to the long format where each row corresponds to a single observation
> with covariates of subject and occasion.  The key is that you are
> regarding age as an unordered categorical variable.

	In the real problem I am now thinking that I would want to
	consider the ``real'' variable corresponding to age to be
	an ***ordered*** categorical variable.  See below.  But
	I'm all at sea with ordered factors. :-(
>
> The wide format view works fine until it doesn't.  When I first
> started looking at longitudinal data I saw discussions of what to do
> about missing data or what to do if the nominal ages are 10, 11 and 12
> years but you actually see the subject at ages 10.10, 10.92 and 12.03
> years.  If you think of putting the data into the long format these
> questions don't come up.  If you are missing an observation then you
> delete that row.  If different subjects are recorded at different ages
> then so be it.  Record the ages at which you actually saw them.
>
> Then examine the data, in my case I would use a lattice plot such as
> the enclosed, to see what a typical within-subject trajectory is.  The
> plot is generated with the enclosed scripts and some models are fit.
> The data are balanced with respect to the number of occasions at which
> the subject's height is measured but the actual ages are somewhat
> unbalanced.  One can go ahead and fit a model to these data using age
> as a covariate, even though there are 14, not 9, unique ages.
>
> The model you want to fit would have 9 distinct random effects for
> each subject, which would be a saturated model.  I would claim that
> you almost never need a saturated model like that.  Here I have
> allowed for fixed and random effects to a third-order polynomial but
> even that is probably stretching the point.  Looking at the data plot
> doesn't convince me that fitting cubic terms has practical
> significance, even if it is statistically significant.
>
> As I write this I hear George Box's voice extolling the virtues of
> parsimony in a model.  Whenever you have a covariate like time I think
> it is a waste to convert it to a categorical covariate, even if
> everyone was measured at exactly the same times or ages.
>
> The bottom line is that you can't fit a saturated linear mixed model
> with lmer reliably because lmer will always throw in one variance
> parameter in addition to those generated by the random-effects terms.
> So my advice is "Don't do that." :-)

	Okay, I won't.  But in my ``real'' data the fixed effect (the
	one involving repeated measures) is not age but rather ``school
	year'' (or rather --- more complicatedly still --- ``school year
	gap''.  My clients are interested in the differences in test
	scores between the end of year 6 and the beginning of year 6,
	the beginning of year 7 and the end of year 6, and the end of
	year 7 and the beginning of year 7.  The three gaps in

		|---year 6---|---summer---|---year 7---|

	I didn't want to muddy the waters in my toy example by trying
	to explain all this complication.  (And the response variable
	is test scores, not heights! So we can have decreases which
	means negative values of the response.)

	Moreover I have a bunch of other variates to take into consideration;
	sex, ethnicity, first language (fixed effects) and school (random
	effect within which children are nested).

	I thought I should get my head around the ultra-simple scenario
	described in my toy example before I tackled the messiness of reality.

	So I have the variable ``gap'' (with three values) instead of the
	age variable that I used in my toy example.  It seems to me that
	``gap'' ***cannot*** be treated as a numeric variable, like age.

	So how should I treat it?  It would seem to make sense to treat it
	as an ***ordered*** factor.  Bozhemoi!  I've never understood ordered
	factors either. :-(

	Okay, s'pose I have a data frame with columns: ID, gap, test.res, sex,
	ethnicity, language, school.  Let's ignore all but the first three
	columns to start with to keep things simple.  As indicated above I am
	thinking of taking ``gap'' to be an ordered factor with levels 1, 2,  
and 3.

	Can you suggest a sensible recipe or two that I could try to get myself
	started with?

	I'm unclear as to the covariance structure induced or assumed in the  
polynomial
	models that you have fitted to the Oxboys data.  There are, for each  
boy,
	9 observations of the boy's height, at various ages.  If we let the  
heights for
	a particular boy be (H_1,...,H_9) what can we say --- or what are we  
assuming ---
	about, e.g., Cov(H_3,H_7)?  Is this expressed as some function of  
(age_3 - age_7)
	for that boy?  Or do these covariances not come into the picture at  
all?

	Grateful as always for enlightenment.

		cheers,

			Rolf

P. S.  If anyone wants to have a go at analyzing the real data ..... :-)

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}




More information about the R-sig-mixed-models mailing list