[R-sig-ME] seeking some advice on fixed vs random specification

Wed Nov 2 00:54:22 CET 2011

Think about the implications in terms of statistical distributions, Derek.

If the two variables are fixed, so is their interaction. How could it 
become a random variable?

If at least one of the variables involved is random, then 
interactions would also be random.

Fixed variables are deterministic. Random variables are sampled from 
a population and their distribution modeled in the fit.

At 02:57 PM 11/1/2011, Derek Dunfield wrote:
>Could this be accomplished by using an explicitly nested REGION:TIME
>variable as a random effect and still keeping REGION and TIME
>variables as fixed effects?
>
>
>On Tue, Nov 1, 2011 at 3:35 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> > On Tue, Nov 1, 2011 at 1:03 PM, david depew <david.depew at queensu.ca> wrote:
> >> Thanks for your response Peter,
> >>
> >> I see what you mean about the nesting vs crossed.
> >>
> >> My understanding (perhaps incorrect) is that by treating TIME 
> and REGION as
> >> fixed effects (i.e. the first formulation)
> >>
> >> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
> >>
> >> the specific contrasts of interest (i.e. REGION1 in TIME A vs REGION1 in
> >> TIME B) could be accomplished, but that this would not be possible if TIME
> >> and REGION were treated as random effects.
> >>
> >> Are you suggesting that they should be treated as both "fixed" and
> >> "random"? i.e.
> >>
> >> lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|TIME) +
> >> (1|REGION/wb_id))
> >
> > Not a good idea.  As one might expect the fixed and random effects for
> > the same factor end up confounded with one another.
> >>
> >>
> >> On Tue, Nov 1, 2011 at 11:56 AM, Peter Claussen 
> <dakotajudo at mac.com> wrote:
> >>
> >>> David,
> >>>
> >>> Have you considered that is TIME*REGION is crossed as fixed effects, you
> >>> should also treat them as crossed if they are random effects (and not
> >>> nested), thus
> >>>
> >>> lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME) + (1|REGION/wb_id))
> >>>
> >>> Are TIME and REGION considered to be two independent sources of random
> >>> variation, which would be implied by this model?
> >>>
> >>> If you want model variation across time differently for each region, then
> >>> perhaps (TIME | REGION/wb_id) may be more appropriate.
> >>>
> >>> I would interpret (1|TIME/REGION), based on analogy to (1 | BLOCK/PLOT),
> >>> to mean the REGION identified as "1" in TIME "A" would not be in any way
> >>> related to REGION "1" in TIME "B"; that is, the region 
> identifier only has
> >>> meaning within the context of the time identifier.
> >>>
> >>> Peter Claussen
> >>> Gylling Data Management
> >>>
> >>> On Oct 31, 2011, at 9:27 AM, david depew wrote:
> >>>
> >>> > Dear list,
> >>> > I am seeking some thoughts/advice on whether my approach to 
> this problem
> >>> > (below) makes sense.
> >>> >
> >>> > We have compiled a rather large dataset (n> 25,000 for most species of
> >>> > interest) on the levels of a contaminant in fish covering 40 
> years and a
> >>> > continental scale. We would like to investigate broad temporal changes
> >>> > across a large geographic region. Because the data comes from a variety
> >>> of
> >>> > sources, with different resources and mandates for sampling fish, we do
> >>> not
> >>> > consider this dataset to be a "true random sample", but in 
> the absence of
> >>> > such, this is the best possible approximation to one.
> >>> >
> >>> > Sites that are sampled over time are generally not sampled frequently
> >>> > enough and with sufficient constraints (sample sizes, sizes of fish) to
> >>> do
> >>> > more focused analysis of temporal trends.
> >>> >
> >>> > Having spent some time perusing the resources available on 
> mixed models,
> >>> I
> >>> > think this offers the best choice for making some sense of this messy
> >>> > dataset. I'm less inclined to try and estimate site specific slopes
> >>> > (regressed over year) for sites that have low sampling effort.
> >>> >
> >>> > Rather, I split the dataset into time periods (A,B and C) of ~ 15 year
> >>> > blocks. (Note: the levels of this particular contaminant are known to
> >>> > change very slowly over time), and assigned each site  to an ecoregion
> >>> > based on geographic location. Thus, I am aiming to assess (if possible)
> >>> > whether levels of contaminant in each ecoregion change over the time
> >>> blocks
> >>> > (A,B and C), where sites are assumed to represent a random selection of
> >>> > possible locations within an ecoregion.
> >>> >
> >>> > The variables of interest are as follows;
> >>> > CONT=contaminant Conc.
> >>> > WB_TYPE = waterbody type (lake, river)
> >>> > PORT = portion (fillet, whole fish)
> >>> > len=mean centered length of fish
> >>> > REGION=Ecoregion (37 unique types)
> >>> > TIME= time block (A, B or C)
> >>> > wb_id=unique id of site
> >>> >
> >>> > My initial thought was to specify the model with time and 
> region as fixed
> >>> > effects.
> >>> >
> >>> > lmer(log(CONT) ~ TIME*REGION + WB_TYPE + PORT*len + (1|wb_id))
> >>> >
> >>> > comparison of this model with one with only additive time and region
> >>> terms
> >>> > suggests that this improves the model fit and the interaction 
> is probably
> >>> > important.
> >>> >
> >>> > I can test TIME and REGION interaction contrasts specifically using the
> >>> > multcomp package and the results indeed suggest some regions have
> >>> > significant changes between time blocks.
> >>> >
> >>> > Or,
> >>> >
> >>> > would it make more sense to specify the time and region effects as part
> >>> of
> >>> > the random terms with site nested within region, nested within time
> >>> period?
> >>> >
> >>> > lmer(log(CONT) ~ WB_TYPE + PORT*len + (1|TIME/REGION/wb_id))
> >>> >
> >>> > I'm assuming (perhaps wrongly) that the conditional means and 95% CI
> >>> could
> >>> > be extracted and compared to assess changes within a region?
> >>> >
> >>> > I'm aware that there are arguments that can be made to treat TIME and
> >>> > REGION as either fixed or random, depending on the objective of the
> >>> > analysis. I'm mainly seeking some clarification if a) my interpretation
> >>> of
> >>> > the specified model is correct, and b) if this makes sense with respect
> >>> to
> >>> > the initial problem.
> >>> >
> >>> > Any thoughts or advice would be much appreciated.
> >>> >
> >>> > thanks
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > David Depew
> >>> > Postdoctoral Fellow
> >>> > School of Environmental Studies
> >>> > Queen's University
> >>> > Kingston, Ontario
> >>> > K7L 3N6
> >>> >
> >>> > david.depew at queensu.ca
> >>> >
> >>> >       [[alternative HTML version deleted]]
> >>> >
> >>> > _______________________________________________
> >>> > R-sig-mixed-models at r-project.org mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>
> >>>
> >>
> >>
> >> --
> >> David Depew
> >> Postdoctoral Fellow
> >> School of Environmental Studies
> >> Queen's University
> >> Kingston, Ontario
> >> K7L 3N6
> >>
> >> david.depew at queensu.ca
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"