[R-sig-ME] seeking some advice on fixed vs random specification
Derek Dunfield
dunfield at mit.edu
Tue Nov 1 20:57:26 CET 2011
Could this be accomplished by using an explicitly nested REGION:TIME
variable as a random effect and still keeping REGION and TIME
variables as fixed effects?
On Tue, Nov 1, 2011 at 3:35 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Tue, Nov 1, 2011 at 1:03 PM, david depew <david.depew at queensu.ca> wrote:
>> Thanks for your response Peter,
>>
>> I see what you mean about the nesting vs crossed.
>>
>> My understanding (perhaps incorrect) is that by treating TIME and REGION as
>> fixed effects (i.e. the first formulation)
>>
>> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>>
>> the specific contrasts of interest (i.e. REGION1 in TIME A vs REGION1 in
>> TIME B) could be accomplished, but that this would not be possible if TIME
>> and REGION were treated as random effects.
>>
>> Are you suggesting that they should be treated as both "fixed" and
>> "random"? i.e.
>>
>> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|TIME) +
>> (1|REGION/wb_id))
>
> Not a good idea. As one might expect the fixed and random effects for
> the same factor end up confounded with one another.
>>
>>
>> On Tue, Nov 1, 2011 at 11:56 AM, Peter Claussen <dakotajudo at mac.com> wrote:
>>
>>> David,
>>>
>>> Have you considered that is TIME*REGION is crossed as fixed effects, you
>>> should also treat them as crossed if they are random effects (and not
>>> nested), thus
>>>
>>> lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME) + (1|REGION/wb_id))
>>>
>>> Are TIME and REGION considered to be two independent sources of random
>>> variation, which would be implied by this model?
>>>
>>> If you want model variation across time differently for each region, then
>>> perhaps (TIME | REGION/wb_id) may be more appropriate.
>>>
>>> I would interpret (1|TIME/REGION), based on analogy to (1 | BLOCK/PLOT),
>>> to mean the REGION identified as "1" in TIME "A" would not be in any way
>>> related to REGION "1" in TIME "B"; that is, the region identifier only has
>>> meaning within the context of the time identifier.
>>>
>>> Peter Claussen
>>> Gylling Data Management
>>>
>>> On Oct 31, 2011, at 9:27 AM, david depew wrote:
>>>
>>> > Dear list,
>>> > I am seeking some thoughts/advice on whether my approach to this problem
>>> > (below) makes sense.
>>> >
>>> > We have compiled a rather large dataset (n> 25,000 for most species of
>>> > interest) on the levels of a contaminant in fish covering 40 years and a
>>> > continental scale. We would like to investigate broad temporal changes
>>> > across a large geographic region. Because the data comes from a variety
>>> of
>>> > sources, with different resources and mandates for sampling fish, we do
>>> not
>>> > consider this dataset to be a "true random sample", but in the absence of
>>> > such, this is the best possible approximation to one.
>>> >
>>> > Sites that are sampled over time are generally not sampled frequently
>>> > enough and with sufficient constraints (sample sizes, sizes of fish) to
>>> do
>>> > more focused analysis of temporal trends.
>>> >
>>> > Having spent some time perusing the resources available on mixed models,
>>> I
>>> > think this offers the best choice for making some sense of this messy
>>> > dataset. I'm less inclined to try and estimate site specific slopes
>>> > (regressed over year) for sites that have low sampling effort.
>>> >
>>> > Rather, I split the dataset into time periods (A,B and C) of ~ 15 year
>>> > blocks. (Note: the levels of this particular contaminant are known to
>>> > change very slowly over time), and assigned each site to an ecoregion
>>> > based on geographic location. Thus, I am aiming to assess (if possible)
>>> > whether levels of contaminant in each ecoregion change over the time
>>> blocks
>>> > (A,B and C), where sites are assumed to represent a random selection of
>>> > possible locations within an ecoregion.
>>> >
>>> > The variables of interest are as follows;
>>> > CONT=contaminant Conc.
>>> > WB_TYPE = waterbody type (lake, river)
>>> > PORT = portion (fillet, whole fish)
>>> > len=mean centered length of fish
>>> > REGION=Ecoregion (37 unique types)
>>> > TIME= time block (A, B or C)
>>> > wb_id=unique id of site
>>> >
>>> > My initial thought was to specify the model with time and region as fixed
>>> > effects.
>>> >
>>> > lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>>> >
>>> > comparison of this model with one with only additive time and region
>>> terms
>>> > suggests that this improves the model fit and the interaction is probably
>>> > important.
>>> >
>>> > I can test TIME and REGION interaction contrasts specifically using the
>>> > multcomp package and the results indeed suggest some regions have
>>> > significant changes between time blocks.
>>> >
>>> > Or,
>>> >
>>> > would it make more sense to specify the time and region effects as part
>>> of
>>> > the random terms with site nested within region, nested within time
>>> period?
>>> >
>>> > lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME/REGION/wb_id))
>>> >
>>> > I'm assuming (perhaps wrongly) that the conditional means and 95% CI
>>> could
>>> > be extracted and compared to assess changes within a region?
>>> >
>>> > I'm aware that there are arguments that can be made to treat TIME and
>>> > REGION as either fixed or random, depending on the objective of the
>>> > analysis. I'm mainly seeking some clarification if a) my interpretation
>>> of
>>> > the specified model is correct, and b) if this makes sense with respect
>>> to
>>> > the initial problem.
>>> >
>>> > Any thoughts or advice would be much appreciated.
>>> >
>>> > thanks
>>> >
>>> >
>>> >
>>> > --
>>> > David Depew
>>> > Postdoctoral Fellow
>>> > School of Environmental Studies
>>> > Queen's University
>>> > Kingston, Ontario
>>> > K7L 3N6
>>> >
>>> > david.depew at queensu.ca
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > R-sig-mixed-models at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>
>> --
>> David Depew
>> Postdoctoral Fellow
>> School of Environmental Studies
>> Queen's University
>> Kingston, Ontario
>> K7L 3N6
>>
>> david.depew at queensu.ca
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list