[R-sig-ME] seeking some advice on fixed vs random specification

Tue Nov 1 20:35:42 CET 2011

On Tue, Nov 1, 2011 at 1:03 PM, david depew <david.depew at queensu.ca> wrote:
> Thanks for your response Peter,
>
> I see what you mean about the nesting vs crossed.
>
> My understanding (perhaps incorrect) is that by treating TIME and REGION as
> fixed effects (i.e. the first formulation)
>
> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>
> the specific contrasts of interest (i.e. REGION1 in TIME A vs REGION1 in
> TIME B) could be accomplished, but that this would not be possible if TIME
> and REGION were treated as random effects.
>
> Are you suggesting that they should be treated as both "fixed" and
> "random"? i.e.
>
> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|TIME) +
> (1|REGION/wb_id))

Not a good idea.  As one might expect the fixed and random effects for
the same factor end up confounded with one another.
>
>
> On Tue, Nov 1, 2011 at 11:56 AM, Peter Claussen <dakotajudo at mac.com> wrote:
>
>> David,
>>
>> Have you considered that is TIME*REGION is crossed as fixed effects, you
>> should also treat them as crossed if they are random effects (and not
>> nested), thus
>>
>> lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME) + (1|REGION/wb_id))
>>
>> Are TIME and REGION considered to be two independent sources of random
>> variation, which would be implied by this model?
>>
>> If you want model variation across time differently for each region, then
>> perhaps (TIME | REGION/wb_id) may be more appropriate.
>>
>> I would interpret (1|TIME/REGION), based on analogy to (1 | BLOCK/PLOT),
>> to mean the REGION identified as "1" in TIME "A" would not be in any way
>> related to REGION "1" in TIME "B"; that is, the region identifier only has
>> meaning within the context of the time identifier.
>>
>> Peter Claussen
>> Gylling Data Management
>>
>> On Oct 31, 2011, at 9:27 AM, david depew wrote:
>>
>> > Dear list,
>> > I am seeking some thoughts/advice on whether my approach to this problem
>> > (below) makes sense.
>> >
>> > We have compiled a rather large dataset (n> 25,000 for most species of
>> > interest) on the levels of a contaminant in fish covering 40 years and a
>> > continental scale. We would like to investigate broad temporal changes
>> > across a large geographic region. Because the data comes from a variety
>> of
>> > sources, with different resources and mandates for sampling fish, we do
>> not
>> > consider this dataset to be a "true random sample", but in the absence of
>> > such, this is the best possible approximation to one.
>> >
>> > Sites that are sampled over time are generally not sampled frequently
>> > enough and with sufficient constraints (sample sizes, sizes of fish) to
>> do
>> > more focused analysis of temporal trends.
>> >
>> > Having spent some time perusing the resources available on mixed models,
>> I
>> > think this offers the best choice for making some sense of this messy
>> > dataset. I'm less inclined to try and estimate site specific slopes
>> > (regressed over year) for sites that have low sampling effort.
>> >
>> > Rather, I split the dataset into time periods (A,B and C) of ~ 15 year
>> > blocks. (Note: the levels of this particular contaminant are known to
>> > change very slowly over time), and assigned each site  to an ecoregion
>> > based on geographic location. Thus, I am aiming to assess (if possible)
>> > whether levels of contaminant in each ecoregion change over the time
>> blocks
>> > (A,B and C), where sites are assumed to represent a random selection of
>> > possible locations within an ecoregion.
>> >
>> > The variables of interest are as follows;
>> > CONT=contaminant Conc.
>> > WB_TYPE = waterbody type (lake, river)
>> > PORT = portion (fillet, whole fish)
>> > len=mean centered length of fish
>> > REGION=Ecoregion (37 unique types)
>> > TIME= time block (A, B or C)
>> > wb_id=unique id of site
>> >
>> > My initial thought was to specify the model with time and region as fixed
>> > effects.
>> >
>> > lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>> >
>> > comparison of this model with one with only additive time and region
>> terms
>> > suggests that this improves the model fit and the interaction is probably
>> > important.
>> >
>> > I can test TIME and REGION interaction contrasts specifically using the
>> > multcomp package and the results indeed suggest some regions have
>> > significant changes between time blocks.
>> >
>> > Or,
>> >
>> > would it make more sense to specify the time and region effects as part
>> of
>> > the random terms with site nested within region, nested within time
>> period?
>> >
>> > lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME/REGION/wb_id))
>> >
>> > I'm assuming (perhaps wrongly) that the conditional means and 95% CI
>> could
>> > be extracted and compared to assess changes within a region?
>> >
>> > I'm aware that there are arguments that can be made to treat TIME and
>> > REGION as either fixed or random, depending on the objective of the
>> > analysis. I'm mainly seeking some clarification if a) my interpretation
>> of
>> > the specified model is correct, and b) if this makes sense with respect
>> to
>> > the initial problem.
>> >
>> > Any thoughts or advice would be much appreciated.
>> >
>> > thanks
>> >
>> >
>> >
>> > --
>> > David Depew
>> > Postdoctoral Fellow
>> > School of Environmental Studies
>> > Queen's University
>> > Kingston, Ontario
>> > K7L 3N6
>> >
>> > david.depew at queensu.ca
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>
> --
> David Depew
> Postdoctoral Fellow
> School of Environmental Studies
> Queen's University
> Kingston, Ontario
> K7L 3N6
>
> david.depew at queensu.ca
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>