[R-sig-eco] which factor to nest?

Mon Jan 26 05:49:41 CET 2009

Kingsford Jones wrote:
> On Sun, Jan 25, 2009 at 6:43 PM, tavery <trevor.avery at acadiau.ca> wrote:
>> Thanks Ben for a speedy response...
>> I agree that a GLMM is probably more prudent and will investigate that idea
>> further. My guard is against making the analysis too complicated...
>>
>> For interest and discussion: In the minutes between my post and the response
>> I went 'way back' to consider an even simpler design (if it works with
>> unbalanced data). In essence the beekeepers are block factors as the
>> treatments were applied within these blocks to colonies at random and the
>> beekeeper is not of any interest, just the parasite numbers of the colonies.
>> In fundamental design terms, the randomized block design appears a viable
>> option. However there are likely issues with the count data (that I will
>> investigate as I am unaware of the data per se, but do have access to
>> similar data that do, in fact, have lots of zeros).
> 
> Hi Trevor,
> 
> Yes, it sounds as though you have a nice, simple RBD that can be
> analyzed using the code Ben suggested.  The lack of balance shouldn't
> be a problem as long as you use one of the mixed models functions
> (lmer, lme, glmmPQL, etc) rather than aov.  The fact that you have
> count data shouldn't be a problem, although if you have an excessive
> number of zeros you might want to have a look at the non-CRAN package
> glmmADMB.
> 
> hth,
> 
> Kingsford Jones

  Just a quick point: *estimation* should be fairly straightforward
(easiest with log-transformed data -> lmer, lme, harder with Poisson
data -> glmer, glmmML, glmmAK, hardest with negative
binomial/overdispersed data -> glmmADMB).  Be very careful with
glmmPQL, known to be biased with low (<5-10) average counts per
unit.  *Inference* is a can of worms: read all about it on the
r-sig-mixed-models mailing list archive ...

  You may want to forward further questions along these lines
to r-sig-mixed-models at r-project.org instead ...

  Ben Bolker

> 
> 
>> thanks,
>> trevor
>>
>> Ben Bolker wrote:
>>>  My two cents:
>>>
>>>  * a GLMM if parasite numbers are small enough to
>>> have to deal with them as count data (e.g. lots of zeros).
>>> Otherwise (if you're lucky, as GLMMs are harder) most
>>> likely a lognormal -- log-transform data or log(1+x) if
>>> there are some zeros, and treat as a LMM (nlme or lmer).
>>>
>>>  * "Nesting" is more or less a red herring here, only
>>> really has to do with multiple *random* factors (and
>>> then more to do with the coding of the random factors
>>> than with fundamental experimental design distinctions).
>>>
>>>  * so: antiG vs control is fixed, Beekeeper is probably
>>> best treated as random (7 units is enough to make a
>>> random effect plausible: if you had only 2 or 3 you
>>> would probably have to treat as a fixed effect to
>>> make progress)
>>>
>>>  * because unbalanced (and possibly GLMM), aov/sums
>>> of squares approaches are probably not viable
>>>
>>>  * fairly straightforward with nlme (something like
>>> lme(logparasites ~ antiG, random = ~1|Beekeeper) or
>>> lme4:
>>>
>>> lmer(logparasites ~ antiG + (1|Beekeeper)) or
>>> (for GLMM)
>>>
>>> glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson)
>>>
>>>  * Two more things to watch out for:
>>>
>>>   - lme (nlme package) will give you p-values, lmer (lme4 package)
>>> will not
>>>   - if you end up fitting a GLMM you should definitely
>>> worry about/check for overdispersion
>>>
>>>  Ben Bolker
>>>
>>>
>>> tavery wrote:
>>>
>>>> Hi all,
>>>> Maybe an expert of this particular design could provide insights into a
>>>> interesting question (or possibly just a derailed view). Possibly outside of
>>>> the R world, but has to be sorted out before R code can be generated - which
>>>> should be trivial...
>>>>
>>>> - 7 beekeepers each with several hives
>>>> - some hives treated with antiG, others left as controls
>>>> - unbalanced design (not an equal number of treated or control sites
>>>> among or within beekeepers)
>>>> - measured parasite numbers (average per hive)
>>>> Q: want to know if antiG reduces parasite load
>>>>
>>>> The initial reaction (from a student) was to consider Beekeeper as a
>>>> random factor (although it could be considered fixed), and nest Treatment
>>>> (antiG or control) within Beekeeper. This design is intuitive as Beekeepers
>>>> are 'groups' and hives are 'subgroups' to which treatments are applied. Upon
>>>> some investigation, it appears that the model could be flipped i.e. consider
>>>> Treatment as a fixed factor and nest Beekeeper within Treatment. In this
>>>> latter case, each Beekeeper would be represented in each Treatment and a
>>>> crossed design results i.e. not nested at all. Various texts appear to
>>>> 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore
>>>> example).
>>>>
>>>> a) What design is correct?
>>>> b) What am I missing in way of determining groups and the ultimate
>>>> design?
>>>>
>>>> thanks in advance,
>>>> trevor
>>>> biology department
>>>> acadia
>>>>

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc