[R] [R-sig-ME] lme nesting/interaction advice
Rolf Turner
r.turner at auckland.ac.nz
Tue May 13 00:39:36 CEST 2008
On 13/05/2008, at 4:09 AM, Douglas Bates wrote:
> I'm entering this discussion late so I may be discussing issues that
> have already been addressed.
>
> As I understand it, Federico, you began by describing a model for data
> in which two factors have a fixed set of levels and one factor has an
> extensible, or "random", set of levels and you wanted to fit a model
> that you described as
>
> y ~ effect1 * effect2 * effect3
>
> The problem is that this specification is not complete.
At *last* (as Owl said to Rabbit) we're getting somewhere!!!
I always knew that there was some basic fundamental point
about this business that I (and I believe many others) were
simply missing. But I could not for the life of me get anyone
to explain to me what that point was. Or to put it another
way, I was never able to frame a question that would illuminate
just what it was that I wasn't getting.
I now may be at a stage where I can start asking the right
questions.
> An interaction of factors with fixed levels and a factor with random
> levels can mean, in the lmer specification,
>
> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|
> effect1:effect2:effect3), ...)
>
> or
>
> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>
> or other variations. When you specify a random effect or an random
> interaction term you must, either explicitly or implicitly, specify
> the form of the variance-covariance matrix associated with those
> random effects.
>
> The "advantage" that other software may provide for you is that it
> chooses the model for you but that, of course, means that you only
> have the one choice.
Now may I start asking what I hope are questions that will lift
the fog a bit?
Let us for specificity consider a three-way model with two
fixed effects and one random effect from the good old Rothamstead style
agricultural experiment context: Suppose we have a number of
species/breeds of wheat (say) and a number of fertilizers.
These are fixed effects. And we have a number of fields (blocks)
--- a random effect. Each breed-fertilizer combination is
applied a number of times in each field. We ***assume*** that
that the field or block effect is homogeneous throughout. This
may or may not be a ``good'' assumption, but it's not completely
ridiculous and would often be made in practice. And probably
*was* made at Rothamstead. The response would be something like
yield in bushels per acre.
The way that I would write the ``full'' model for this setting,
in mathematical style is:
Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
+ (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl
The alpha_i and beta_j are parameters corresponding to breed and
fertilizer
respectively; the C_k are random effects corresponding to fields or
blocks.
Any effect ``involving'' C is also random.
The assumptions made by the Package-Which-Must-Not-Be-Named are (I
think)
that
C_k ~ N(0,sigma_C^2)
(alpha.C)_ik ~ N(0,sigma_aC^2)
(beta.C)jk ~ N(0,sigma_bC^2)
(alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
E_ijkl ~ N(0,sigma^2)
and these random variables are *all independent*.
Ahhhhhhhh ... perhaps I'm on the way to answering my own question. Is
it this assumption of ``all independent'' which is questionable? It
seemed innocent enough when I first learned about this stuff, lo these
many years ago. But .... mmmmmaybe not!
To start with: What would be the lmer syntax to fit the foregoing
(possibly naive) model? I am sorry, but I really cannot get my head
around the syntax of lmer model specification, and I've tried. I
really have. Hard. I know I must be starting from the wrong place,
but I haven't a clue as to what the *right* place to start from is.
And if I'm in that boat, I will wager Euros to pretzels that there
are others in it. I know that I'm not the brightest bulb in the
chandelier, but I'm not the dullest either.
Having got there: Presuming that I'm more-or-less on the right track
in my foregoing conjecture that it's the over-simple dependence
structure
that is the problem with what's delivered by the Package-Which-Must-
Not-Be-Named,
how might one go about being less simple-minded? I.e. what might be
some
more realistic dependence structures, and how would one specify
these in lmer?
And how would one assess whether the assumed dependence structure gives
a reasonable fit to the data?
> If you can describe how many variance components you think should be
> estimated in your model and what they would represent then I think it
> will be easier to describe how to fit the model.
How does this fit in with my conjecture (above) about what I've been
missing all these years? Does it fit? How many variance components
are there in the ``naive'' model? It looks like 5 to me ... but maybe
I'm totally out to lunch in what I think I'm understanding at this
stage.
(And besides --- there are three sorts of statistician; those who
can count,
and those who can't.)
Thank you for your indulgence.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list