[R] [R-sig-ME] lme nesting/interaction advice

Tue May 13 00:39:36 CEST 2008

On 13/05/2008, at 4:09 AM, Douglas Bates wrote:

> I'm entering this discussion late so I may be discussing issues that
> have already been addressed.
>
> As I understand it, Federico, you began by describing a model for data
> in which two factors have a fixed set of levels and one factor has an
> extensible, or "random", set of levels and you wanted to fit a model
> that you described as
>
> y ~ effect1 * effect2 * effect3
>
> The problem is that this specification is not complete.

	At *last* (as Owl said to Rabbit) we're getting somewhere!!!

	I always knew that there was some basic fundamental point
	about this business that I (and I believe many others) were
	simply missing.  But I could not for the life of me get anyone
	to explain to me what that point was.  Or to put it another
	way, I was never able to frame a question that would illuminate
	just what it was that I wasn't getting.

	I now may be at a stage where I can start asking the right
	questions.

> An interaction of factors with fixed levels and a factor with random
> levels can mean, in the lmer specification,
>
> lmer(y ~ effect1 * effect2 + (1| effect3) + (1| 
> effect1:effect2:effect3), ...)
>
> or
>
> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>
> or other variations.  When you specify a random effect or an random
> interaction term you must, either explicitly or implicitly, specify
> the form of the variance-covariance matrix associated with those
> random effects.
>
> The "advantage" that other software may provide for you is that it
> chooses the model for you but that, of course, means that you only
> have the one choice.

	Now may I start asking what I hope are questions that will lift
	the fog a bit?

	Let us for specificity consider a three-way model with two
	fixed effects and one random effect from the good old Rothamstead style
	agricultural experiment context:  Suppose we have a number of
	species/breeds of wheat (say) and a number of fertilizers.
	These are fixed effects.  And we have a number of fields (blocks)
	--- a random effect.  Each breed-fertilizer combination is
	applied a number of times in each field.  We ***assume*** that
	that the field or block effect is homogeneous throughout.  This
	may or may not be a ``good'' assumption, but it's not completely
	ridiculous and would often be made in practice.  And probably
	*was* made at Rothamstead.  The response would be something like
	yield in bushels per acre.

	The way that I would write the ``full'' model for this setting,
	in mathematical style is:

	Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
                     + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl

	The alpha_i and beta_j are parameters corresponding to breed and  
fertilizer
	respectively; the C_k are random effects corresponding to fields or  
blocks.
	Any effect ``involving'' C is also random.

	The assumptions made by the Package-Which-Must-Not-Be-Named are (I  
think)
	that

		C_k ~ N(0,sigma_C^2)
		(alpha.C)_ik ~ N(0,sigma_aC^2)
		(beta.C)jk ~ N(0,sigma_bC^2)
		(alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
		E_ijkl ~ N(0,sigma^2)

	and these random variables are *all independent*.

	Ahhhhhhhh ... perhaps I'm on the way to answering my own question.  Is
	it this assumption of ``all independent'' which is questionable?  It
	seemed innocent enough when I first learned about this stuff, lo these
	many years ago.  But .... mmmmmaybe not!

	To start with:  What would be the lmer syntax to fit the foregoing
	(possibly naive) model?  I am sorry, but I really cannot get my head
	around the syntax of lmer model specification, and I've tried.  I
	really have.  Hard.  I know I must be starting from the wrong place,
	but I haven't a clue as to what the *right* place to start from is.
	And if I'm in that boat, I will wager Euros to pretzels that there
	are others in it.  I know that I'm not the brightest bulb in the
	chandelier, but I'm not the dullest either.

	Having got there:  Presuming that I'm more-or-less on the right track
	in my foregoing conjecture that it's the over-simple dependence  
structure
	that is the problem with what's delivered by the Package-Which-Must- 
Not-Be-Named,
	how might one go about being less simple-minded?  I.e. what might be  
some
	more realistic dependence structures, and how would one specify  
these in lmer?
	And how would one assess whether the assumed dependence structure gives
	a reasonable fit to the data?

> If you can describe how many variance components you think should be
> estimated in your model and what they would represent then I think it
> will be easier to describe how to fit the model.

	How does this fit in with my conjecture (above) about what I've been
	missing all these years?  Does it fit?  How many variance components
	are there in the ``naive'' model?  It looks like 5 to me ... but maybe
	I'm totally out to lunch in what I think I'm understanding at this  
stage.
	(And besides --- there are three sorts of statistician; those who  
can count,
	and those who can't.)

	Thank you for your indulgence.

		 cheers,

			Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}