[R-sig-ME] Model specification: crossed vs nested factors

Wed Aug 25 00:26:32 CEST 2010

Dennis Murphy wrote:
> Hi:
>
> No direct answers, but some questions...
>
> On Tue, Aug 24, 2010 at 2:01 PM, Richard Feldman <
> richard.feldman at mail.mcgill.ca> wrote:
>
>   
>> Hello,
>>
>> I am at my wit's end with regards to specifying my model, perhaps because I
>> am confused about nested vs. crossed grouping factors.
>>
>> My dataset has 16 sites and within each site I applied 3 treatments (A, B,
>> C). The sites differ based on elevation. I originally thought I had a
>> hierarchical (nested) model and specified the full model as such:
>>
>> How did you assign treatments within site? Were they assigned to divisions
>>     
> of a site (e.g., subplots) or were they assigned to the entire site at
> different times, or ???  This matters in the analysis...a lot.
>   

  Yes. Telling us the total number of observations would help a lot.  Is
it 48, or more?
>
>
>   
>> model.n <- glmer(Y ~ Treatment*Elevation + (Treatment|Site), data=Data)
>>
>> The data also seemed analogous to a longitudinal model where instead of
>> subject I have site and instead of time/days I have treatment. I am not
>> totally clear on why this analogy breaks down.
>>
>>     
>
> Longitudinal models involve a time element, usually within subject/primary
> unit, and constitute repeated measurements on that unit over time with the
> same treatment conditions and possibly time-varying covariates. How would
> such a scenario correspond to your design?
>
>
>   
>> After extensive reading, it seems that because each site receives the same
>> three treatments, my model is crossed and not nested. Hence the
>> specification should be:
>>
>>     
>
> It's not obvious at this point whether you have crossed or nested effects.
> It's entirely possible that site could be a blocking factor. Go back to the
> initial question.
>
>
>   
>> model.c <- glmer(Y ~ Treatment*Elevation + (1|Site) + (1|Treatment),
>> data=Data)
>>
>> I have three questions:
>>
>> 1. Is model.c indeed the correct specification given my data?
>>
>> 2. Given model.c, does the treatment by elevation interaction capture this
>> cross-scale effect, even though the former is a level-1 predictor (varies
>> within site) and the latter a level-2 predictor (varies among sites)?
>>
>> 3. The output from model.c gives zero variance for the random effect of
>> treatment. I assume this is because there are only three levels. Hence,
>> treatment can only be a fixed variable.
  You typically wouldn't want treatment to by a random variable anyway,
would you?  Unless it's not
a typical treatment ...
>>  I have no problem with that. What I
>> am confused about is how I can discover how much the treatment-response
>> relationship varies among sites. I originally thought that (Treatment|Site)
>> made sense because the treatment-response slope could vary based on site.
>>
>>     
  I'm guessing here, but let's suppose that you do indeed have 48 total
observations, of 3 treatments each at 16 sites, each site having its own
elevation value (i.e. regression design), but the treatments at each
site are unreplicated, and share the same 'elevation' value.

     Then I would say that

glmer(Y~Treatment*Elevation + (1|Site), family=..., data=...)

is the appropriate specification.  The fixed effects evaluate whether
there is a trend (linear on the scale of the linear predictor) with
elevation, whether there is a treatment effect, and whether the
elevational trend varies by treatment.  The Site random effect cleans up
any systematic variation among Site that is not accounted for by the
elevational trends.  I was going to say that with this assumed design,
you don't have enough data to estimate Site-by-Treatment interactions
(i.e. treatment has different effects at different sites), because
Treatment is not replicated within sites, but I'm going to waffle a bit
and say that at the very least the Site:Treatment interaction is *hard*
to identify (whether it's exactly, qualitatively, unidentifiable or not)
because of the lack of replication.
   Suppose we leave elevation out for a moment.  Then we are trying to
decide between Y~Treatment+(1|Site) or  Y~Treatment+(Treatment|Site).  I
think that in a classic linear mixed model we would be in trouble with
the latter model, because our error variance would be indistinguishable
from the treatment:site variance
   If I wanted to try to convince myself one way or the other without
actually thinking about it more carefully, I would
set up a simulation with lots of sites and lots of treatments, but with
unreplicated treatments within sites, and see if I
could estimate anything successfully with a sufficiently large data
set.  Alternatively, I would go back and stare at
some classic book (Quinn and Keough is nice for ecologists) and follow
the logic for regular old mixed models, which
applies in this case too even if the algorithms are different.

  I just saw your other e-mail.
  You say "In Site #1, treatment A was applied, measurements taken, then
treatment B was applied, etc.".  Was the order of the treatments
randomly assigned at each site?  If not, the time effect would be
indistinguishable from the treatment effect. If so, I would consider

  glmer(Y~Treatment*Elevation +Time+ (1|Site), family=..., data=...)

  good luck.