[R-sig-ME] seeking some advice on fixed vs random specification

Tue Nov 1 21:19:43 CET 2011

On Tue, Nov 1, 2011 at 2:57 PM, Derek Dunfield <dunfield at mit.edu> wrote:
> Could this be accomplished by using an explicitly nested REGION:TIME
> variable as a random effect and still keeping REGION and TIME
> variables as fixed effects?

I think it could but that would be a bit unusual.  I am more
accustomed to having fixed-effects for one covariate, random effects
for another and random effects for the interaction, as in
(1|REGION:TIME)
>
> On Tue, Nov 1, 2011 at 3:35 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
>> On Tue, Nov 1, 2011 at 1:03 PM, david depew <david.depew at queensu.ca> wrote:
>>> Thanks for your response Peter,
>>>
>>> I see what you mean about the nesting vs crossed.
>>>
>>> My understanding (perhaps incorrect) is that by treating TIME and REGION as
>>> fixed effects (i.e. the first formulation)
>>>
>>> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>>>
>>> the specific contrasts of interest (i.e. REGION1 in TIME A vs REGION1 in
>>> TIME B) could be accomplished, but that this would not be possible if TIME
>>> and REGION were treated as random effects.
>>>
>>> Are you suggesting that they should be treated as both "fixed" and
>>> "random"? i.e.
>>>
>>> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|TIME) +
>>> (1|REGION/wb_id))
>>
>> Not a good idea.  As one might expect the fixed and random effects for
>> the same factor end up confounded with one another.
>>>
>>>
>>> On Tue, Nov 1, 2011 at 11:56 AM, Peter Claussen <dakotajudo at mac.com> wrote:
>>>
>>>> David,
>>>>
>>>> Have you considered that is TIME*REGION is crossed as fixed effects, you
>>>> should also treat them as crossed if they are random effects (and not
>>>> nested), thus
>>>>
>>>> lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME) + (1|REGION/wb_id))
>>>>
>>>> Are TIME and REGION considered to be two independent sources of random
>>>> variation, which would be implied by this model?
>>>>
>>>> If you want model variation across time differently for each region, then
>>>> perhaps (TIME | REGION/wb_id) may be more appropriate.
>>>>
>>>> I would interpret (1|TIME/REGION), based on analogy to (1 | BLOCK/PLOT),
>>>> to mean the REGION identified as "1" in TIME "A" would not be in any way
>>>> related to REGION "1" in TIME "B"; that is, the region identifier only has
>>>> meaning within the context of the time identifier.
>>>>
>>>> Peter Claussen
>>>> Gylling Data Management
>>>>
>>>> On Oct 31, 2011, at 9:27 AM, david depew wrote:
>>>>
>>>> > Dear list,
>>>> > I am seeking some thoughts/advice on whether my approach to this problem
>>>> > (below) makes sense.
>>>> >
>>>> > We have compiled a rather large dataset (n> 25,000 for most species of
>>>> > interest) on the levels of a contaminant in fish covering 40 years and a
>>>> > continental scale. We would like to investigate broad temporal changes
>>>> > across a large geographic region. Because the data comes from a variety
>>>> of
>>>> > sources, with different resources and mandates for sampling fish, we do
>>>> not
>>>> > consider this dataset to be a "true random sample", but in the absence of
>>>> > such, this is the best possible approximation to one.
>>>> >
>>>> > Sites that are sampled over time are generally not sampled frequently
>>>> > enough and with sufficient constraints (sample sizes, sizes of fish) to
>>>> do
>>>> > more focused analysis of temporal trends.
>>>> >
>>>> > Having spent some time perusing the resources available on mixed models,
>>>> I
>>>> > think this offers the best choice for making some sense of this messy
>>>> > dataset. I'm less inclined to try and estimate site specific slopes
>>>> > (regressed over year) for sites that have low sampling effort.
>>>> >
>>>> > Rather, I split the dataset into time periods (A,B and C) of ~ 15 year
>>>> > blocks. (Note: the levels of this particular contaminant are known to
>>>> > change very slowly over time), and assigned each site  to an ecoregion
>>>> > based on geographic location. Thus, I am aiming to assess (if possible)
>>>> > whether levels of contaminant in each ecoregion change over the time
>>>> blocks
>>>> > (A,B and C), where sites are assumed to represent a random selection of
>>>> > possible locations within an ecoregion.
>>>> >
>>>> > The variables of interest are as follows;
>>>> > CONT=contaminant Conc.
>>>> > WB_TYPE = waterbody type (lake, river)
>>>> > PORT = portion (fillet, whole fish)
>>>> > len=mean centered length of fish
>>>> > REGION=Ecoregion (37 unique types)
>>>> > TIME= time block (A, B or C)
>>>> > wb_id=unique id of site
>>>> >
>>>> > My initial thought was to specify the model with time and region as fixed
>>>> > effects.
>>>> >
>>>> > lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
>>>> >
>>>> > comparison of this model with one with only additive time and region
>>>> terms
>>>> > suggests that this improves the model fit and the interaction is probably
>>>> > important.
>>>> >
>>>> > I can test TIME and REGION interaction contrasts specifically using the
>>>> > multcomp package and the results indeed suggest some regions have
>>>> > significant changes between time blocks.
>>>> >
>>>> > Or,
>>>> >
>>>> > would it make more sense to specify the time and region effects as part
>>>> of
>>>> > the random terms with site nested within region, nested within time
>>>> period?
>>>> >
>>>> > lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME/REGION/wb_id))
>>>> >
>>>> > I'm assuming (perhaps wrongly) that the conditional means and 95% CI
>>>> could
>>>> > be extracted and compared to assess changes within a region?
>>>> >
>>>> > I'm aware that there are arguments that can be made to treat TIME and
>>>> > REGION as either fixed or random, depending on the objective of the
>>>> > analysis. I'm mainly seeking some clarification if a) my interpretation
>>>> of
>>>> > the specified model is correct, and b) if this makes sense with respect
>>>> to
>>>> > the initial problem.
>>>> >
>>>> > Any thoughts or advice would be much appreciated.
>>>> >
>>>> > thanks
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Depew
>>>> > Postdoctoral Fellow
>>>> > School of Environmental Studies
>>>> > Queen's University
>>>> > Kingston, Ontario
>>>> > K7L 3N6
>>>> >
>>>> > david.depew at queensu.ca
>>>> >
>>>> >       [[alternative HTML version deleted]]
>>>> >
>>>> > _______________________________________________
>>>> > R-sig-mixed-models at r-project.org mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>>
>>>
>>>
>>> --
>>> David Depew
>>> Postdoctoral Fellow
>>> School of Environmental Studies
>>> Queen's University
>>> Kingston, Ontario
>>> K7L 3N6
>>>
>>> david.depew at queensu.ca
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>