[R-sig-ME] [R] lme nesting/interaction advice
John Maindonald
john.maindonald at anu.edu.au
Fri May 16 06:28:59 CEST 2008
I've been looking back over this discussion.
Another model that one can fit using lme is:
> lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)),
+ weights=varIdent(form=~1|Machine), data=Machines)
Linear mixed-effects model fit by REML
Data: Machines
Log-restricted-likelihood: -108.9547
Fixed: score ~ Machine
(Intercept) MachineB MachineC
52.355556 7.966667 13.916667
Random effects:
Formula: ~0 + Machine | Worker
Structure: Multiple of an Identity
MachineA MachineB MachineC Residual
StdDev: 6.06209 6.06209 6.06209 1.148581
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Machine
Parameter estimates:
A B C
1.0000000 0.8713263 0.5859709
Number of Observations: 54
Number of Groups: 6
This insists (I think) that conditional on the random effect for the
worker,
the worker/machine random effects be independent,
but allows them to have different variances. I am wondering whether
it is possible to fit such a model using lmer().
[In this example the large estimated correlations suggest that it is not
a sensible model.]
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 14 May 2008, at 2:49 AM, Douglas Bates wrote:
> On Mon, May 12, 2008 at 5:39 PM, Rolf Turner
> <r.turner at auckland.ac.nz> wrote:
>>
>> On 13/05/2008, at 4:09 AM, Douglas Bates wrote:
>>
>>> I'm entering this discussion late so I may be discussing issues that
>>> have already been addressed.
>>>
>>> As I understand it, Federico, you began by describing a model for
>>> data
>>> in which two factors have a fixed set of levels and one factor has
>>> an
>>> extensible, or "random", set of levels and you wanted to fit a model
>>> that you described as
>>>
>>> y ~ effect1 * effect2 * effect3
>>>
>>> The problem is that this specification is not complete.
>>
>> At *last* (as Owl said to Rabbit) we're getting somewhere!!!
>>
>> I always knew that there was some basic fundamental point
>> about this business that I (and I believe many others) were
>> simply missing. But I could not for the life of me get anyone
>> to explain to me what that point was. Or to put it another
>> way, I was never able to frame a question that would illuminate
>> just what it was that I wasn't getting.
>>
>> I now may be at a stage where I can start asking the right
>> questions.
>>
>>> An interaction of factors with fixed levels and a factor with random
>>> levels can mean, in the lmer specification,
>>>
>>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|
>>> effect1:effect2:effect3),
>>> ...)
>>>
>>> or
>>>
>>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>>>
>>> or other variations. When you specify a random effect or an random
>>> interaction term you must, either explicitly or implicitly, specify
>>> the form of the variance-covariance matrix associated with those
>>> random effects.
>>>
>>> The "advantage" that other software may provide for you is that it
>>> chooses the model for you but that, of course, means that you only
>>> have the one choice.
>>
>> Now may I start asking what I hope are questions that will lift
>> the fog a bit?
>>
>> Let us for specificity consider a three-way model with two
>> fixed effects and one random effect from the good old
>> Rothamstead
>> style
>> agricultural experiment context: Suppose we have a number of
>> species/breeds of wheat (say) and a number of fertilizers.
>> These are fixed effects. And we have a number of fields
>> (blocks)
>> --- a random effect. Each breed-fertilizer combination is
>> applied a number of times in each field. We ***assume*** that
>> that the field or block effect is homogeneous throughout. This
>> may or may not be a ``good'' assumption, but it's not completely
>> ridiculous and would often be made in practice. And probably
>> *was* made at Rothamstead. The response would be something like
>> yield in bushels per acre.
>>
>> The way that I would write the ``full'' model for this setting,
>> in mathematical style is:
>>
>> Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k +
>> (alpha.C)_ik
>> + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl
>>
>> The alpha_i and beta_j are parameters corresponding to breed and
>> fertilizer
>> respectively; the C_k are random effects corresponding to
>> fields or
>> blocks.
>> Any effect ``involving'' C is also random.
>>
>> The assumptions made by the Package-Which-Must-Not-Be-Named
>> are (I
>> think)
>> that
>>
>> C_k ~ N(0,sigma_C^2)
>> (alpha.C)_ik ~ N(0,sigma_aC^2)
>> (beta.C)jk ~ N(0,sigma_bC^2)
>> (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
>> E_ijkl ~ N(0,sigma^2)
>>
>> and these random variables are *all independent*.
>>
>> Ahhhhhhhh ... perhaps I'm on the way to answering my own
>> question.
>> Is
>> it this assumption of ``all independent'' which is
>> questionable? It
>> seemed innocent enough when I first learned about this stuff, lo
>> these
>> many years ago. But .... mmmmmaybe not!
>>
>> To start with: What would be the lmer syntax to fit the
>> foregoing
>> (possibly naive) model? I am sorry, but I really cannot get
>> my head
>> around the syntax of lmer model specification, and I've
>> tried. I
>> really have. Hard. I know I must be starting from the wrong
>> place,
>> but I haven't a clue as to what the *right* place to start
>> from is.
>> And if I'm in that boat, I will wager Euros to pretzels that
>> there
>> are others in it. I know that I'm not the brightest bulb in the
>> chandelier, but I'm not the dullest either.
>
> Thanks for the questions, Rolf. I completely agree that mixed model
> specification can be an extremely confusing area.
>
> Let's consider a set of models for the Machines data reproduced (from
> Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS
> package.
>
>> library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
>
> Attaching package: 'Matrix'
>
>
> The following object(s) are masked from package:stats :
>
> xtabs
>> data("Machines", package = "MEMSS")
>> str(Machines)
> 'data.frame': 54 obs. of 3 variables:
> $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3
> 4 ...
> $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
> $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
>
> We consider the Machine factor to have a fixed set of levels in that
> we only consider these three machines. The levels of the Worker
> factor represent a sample from the set of potential operators. As you
> might imagine from this description, I now think of the distinction
> between "fixed" and "random" as being associated with the factor, not
> necessarily the "effects".
>
> If you plot these data
>> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine,
>> type = c("g", "p", "a"), xlab = "Efficiency score", ylab =
>> "Worker", auto.key = list(columns = 3, lines = TRUE))
>
> (resulting PDF enclosed) you will see evidence of an interaction.
> That is, some workers have noticeably different score patterns on the
> three machines than do others. Worker 6 on machine B is the most
> striking example.
>
> One way to model this interaction is to say that there is a random
> effect for each worker and a separate random effect for each
> worker/machine combination. If the random effects for the
> worker/machine combinations are assumed to be independent with
> constant variance then one expresses the model as
>
>> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|
>> Machine:Worker), Machines), corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)
> Data: Machines
> AIC BIC logLik deviance REMLdev
> 227.7 239.6 -107.8 225.5 215.7
> Random effects:
> Groups Name Variance Std.Dev.
> Machine:Worker (Intercept) 13.90963 3.72956
> Worker (Intercept) 22.85528 4.78072
> Residual 0.92464 0.96158
> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 52.356 2.486 21.062
> MachineB 7.967 2.177 3.659
> MachineC 13.917 2.177 6.393
>
> An equivalent formulation is
>> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines),
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (1 | Worker/Machine)
> Data: Machines
> AIC BIC logLik deviance REMLdev
> 227.7 239.6 -107.8 225.5 215.7
> Random effects:
> Groups Name Variance Std.Dev.
> Machine:Worker (Intercept) 13.90963 3.72956
> Worker (Intercept) 22.85528 4.78072
> Residual 0.92464 0.96158
> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 52.356 2.486 21.062
> MachineB 7.967 2.177 3.659
> MachineC 13.917 2.177 6.393
>
> The expression (1|Worker/Machine) is just "syntactic sugar". It is
> expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
> are created.
>
> If you want to start with the formula and see what that means for the
> model then use these rules:
>
> - a term including the '|' operator is a random effects term
> - if the left-hand side of the '|' operator is 1 then the random
> effects are scalar random effects, one for each level of the factor on
> the right of the '|'
> - random effects associated with different terms are independent
> - random effects associated with different levels of the factor within
> a term are independent
> - the variance of the random effects within the same term is constant
>
> However, there is another mixed-effects model that could make sense
> for these data. Suppose I consider the variations associated with
> each worker as a vector of length 3 (Machines A, B and C) with a
> symmetric, positive semidefinite 3 by 3 variance-covariance matrix. I
> fit that model as
>
>> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines),
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (Machine | Worker)
> Data: Machines
> AIC BIC logLik deviance REMLdev
> 228.3 248.2 -104.2 216.6 208.3
> Random effects:
> Groups Name Variance Std.Dev. Corr
> Worker (Intercept) 16.64051 4.07928
> MachineB 34.54670 5.87764 0.484
> MachineC 13.61398 3.68971 -0.365 0.297
> Residual 0.92463 0.96158
> Number of obs: 54, groups: Worker, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 52.356 1.681 31.151
> MachineB 7.967 2.421 3.291
> MachineC 13.917 1.540 9.037
>
> It may be more meaningful to write it as
>
>> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines),
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (0 + Machine | Worker)
> Data: Machines
> AIC BIC logLik deviance REMLdev
> 228.3 248.2 -104.2 216.6 208.3
> Random effects:
> Groups Name Variance Std.Dev. Corr
> Worker MachineA 16.64097 4.07933
> MachineB 74.39557 8.62529 0.803
> MachineC 19.26646 4.38936 0.623 0.771
> Residual 0.92463 0.96158
> Number of obs: 54, groups: Worker, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 52.356 1.681 31.150
> MachineB 7.967 2.421 3.291
> MachineC 13.917 1.540 9.037
>
> Now we are fitting 3 variances and 3 covariances for the random
> effects instead of the previous models which had two variances. The
> difference in the models is exactly what made you pause - the simpler
> model assumes that, conditional on the random effect for the worker,
> the worker/machine random effects are independent and have constant
> variance. In the more general models the worker/machine interactions
> are allowed to be correlated within worker.
>
> It is more common to allow this kind of correlation within subject in
> models for longitudinal data (the Laird-Ware formulation) where each
> subject has a random effect for the intercept and a random effect for
> the slope with respect to time and these can be correlated. However,
> this type of representation can make sense with a factor on the left
> hand side of the '|' operator, like the Machine factor here. If that
> factor has a large number of levels then the model quickly becomes
> unwieldy because the number of variance-covariance parameters to
> estimate is quadratic in the number of levels of the factor on the
> lhs.
>
> I hope this helps.
>
>> Having got there: Presuming that I'm more-or-less on the
>> right track
>> in my foregoing conjecture that it's the over-simple dependence
>> structure
>> that is the problem with what's delivered by the
>> Package-Which-Must-Not-Be-Named,
>> how might one go about being less simple-minded? I.e. what
>> might be
>> some
>> more realistic dependence structures, and how would one
>> specify these
>> in lmer?
>> And how would one assess whether the assumed dependence
>> structure
>> gives
>> a reasonable fit to the data?
>>
>>> If you can describe how many variance components you think should be
>>> estimated in your model and what they would represent then I think
>>> it
>>> will be easier to describe how to fit the model.
>>
>> How does this fit in with my conjecture (above) about what
>> I've been
>> missing all these years? Does it fit? How many variance
>> components
>> are there in the ``naive'' model? It looks like 5 to me ... but
>> maybe
>> I'm totally out to lunch in what I think I'm understanding at
>> this
>> stage.
>> (And besides --- there are three sorts of statistician; those
>> who can
>> count,
>> and those who can't.)
>>
>> Thank you for your indulgence.
>>
>> cheers,
>>
>> Rolf Turner
>>
>> ######################################################################
>> Attention:This e-mail message is privileged and confidential. If
>> you are not
>> theintended recipient please delete the message and notify the
>> sender.Any
>> views or opinions presented are solely those of the author.
>>
>> This e-mail has been scanned and cleared by
>> MailMarshalwww.marshalsoftware.com
>> ######################################################################
>>
> <Machines.pdf>_______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list