[R-sig-ME] Mixed Models in a very basic replication design

Mon Dec 18 21:06:54 CET 2017

Tim,

There may be many reasons that different pots would respond to treatment uniquely. 

If you plant a number of seeds, will you have 100% germination in each pot? Could differences in percent germination (say, root density) affect soil responses - both in initial measurements (independent intercept) or response over time (independent slope)?

If you have one plant per plot, can you be sure all plants are genetically or physiologically identically, and respond the same to treatments over time? Or do you need to account for the fact that you’re drawing seed from a random population? I had a problem like this years ago, doing a field trial with a soybean variety that was still a segregating population.

How you code “Pot” depends on how you execute the experiment. If you have 2 levels of A and 2 levels of B, replicated 5 times, then you should have 2*2*5=20  pots, assume the 2 time measurements are take twice from the same pot. Then you could write an ANOVA model of the form

aov( ~  Treatment A * Treatment B * Time + Replicate + Error(Treatment A : Treatment B : Replicate))

 since the combination of A,B and Rep uniquely identify Pot. Otherwise, you could assign a id to each pot. 

Myself, I would run something like

aov( ~  Treatment A * Treatment B * Time + Replicate +  Replicate:Treatment + A:Treatment B)

If (Treatment A * Treatment B * Replicate) not significant, relative to EMS, then I would recalculate using 

aov( ~  Treatment A * Treatment B * Time)

It’s not unreasonable for Replicate effects to be 0, and this analysis would give you a bit more error df and a non-significant test suggests that Treatment A:Treatment B:Replicate MS == EMS, so there is only one error term needed for treatment comparisons.

If (Treatment A * Treatment B * Replicate) is significant, you can’t compare two pairs of comparisons taken from different pots (say, A1B1 at time 2 vs A1B2 at time 2) with the same error term that you use for two pairs of comparisons take from the same pot (A1B1 at time 1 vs A1B1 at time 2). That error is easier to get correct using lme or lmer and (1 | Replicate / Pot). I would run

aov( ~  Treatment A * Treatment B * Time + Replicate + Error(Treatment A : Treatment B : Replicate))

just to get a convenient set of F-test for A and B effects.

If (Treatment A : Treatment B : Time) interaction is significant, then you might consider comparing (1 | Replicate / Pot) vs (Time | Replicate / Pot). You could just code (Time | Pot) from the start, but I tend to be conservative when moving beyond a simple AOV. On the face of it, this appears to be a standard repeated-measures-as-split-plot design, so I would plan for an ANOVA, but allow for a mixed-model approach if the data warrant.

Working on a similar problem, I’m using something like

lme(assessment1 ~ Treatment A * Treatment B * Time, random= ~ 1 | Replicate / Pot)

as the default mixed model repeated-measure-as-split-plot analysis, then comparing to different correlated error models (i.e. correlation=corAR1()). With only 2 time points, you won’t need to worry about correlated errors, but it might be useful if you end up taking more measurements.

One caveat. If you end up with missing pots. then you should most certainly skip the AOV calculations and start with a mixed model.

Cheers,

Peter

> On Dec 18, 2017, at 12:13 PM, Tim Richter-Heitmann <trichter at uni-bremen.de> wrote:
> 
> Peter,
> 
> thank you very much. That was a very elucidating answer.
> If i may elaborate on the issue of time a bit more, because i am also a bit clueless here.
> Will the selection of appropriate modelling approaches be a matter of if we disrupt the plants (e.g. harvesting soil) or not (e.g. measuring height or surface areas)? Or is it just to account for possibly individual intercepts for individual plants?
> The plants will be grown from surface-sterilized seeds in homogenized starting soils.
> 
> Also, how would i encode the factor "Pot"? Just in as many levels as i have pots?
> 
> Thank you again very much, Tim
> 
> 
> On 18.12.2017 18:31, Peter Claussen wrote:
>> Tim,
>> 
>> This is more a question of experimental design, but I can answer a bit relevant to mixed models.
>> 
>> In a greenhouse, environmental variation should be negligible and can typically be ignored. In some cases, the variance is so small that it results in a negative estimate from ANOVA. This is most apparent when you have a Location F-ratio less than 1. Briefly, the F-ratio is calculated with an error variance in the denominator, and that same variance plus another source of variance in the numerator, i.e. (EMS + t*Location MS)/EMS. If the ratio is less than 1, then Location MS must be negative.
>> 
>> If this occurs, and you fit the model using lmer and formula  = ~ Treatment A * Treatment B * Time + (1|Location) , you would expect the estimate of Location to be 0, since it would be constrained to be non-negative. If that happens, you can drop location from the model an fit as a CRD using the three-way ANOVA.
>> 
>> However, you also include time in the model. Is this a repeated measures design? If so, then you might want to fit to a mixed model with Pot (or equivalent) as a random effect.
>> 
>> Cheers,
>> 
>> Peter Claussen
>> Biometrician
>> Gylling Data Management, Inc.
>> Brookings, SD 57006-4605 USA
>> Tel. No.: +1 605 692-4021
>> Website:www.gdmdata.com <http://www.gdmdata.com/> <http://www.gdmdata.com/ <http://www.gdmdata.com/>>
>> 
>> 	
>> 
>>> On Dec 18, 2017, at 9:55 AM, Tim Richter-Heitmann <trichter at uni-bremen.de <mailto:trichter at uni-bremen.de> <mailto:trichter at uni-bremen.de <mailto:trichter at uni-bremen.de>>> wrote:
>>> 
>>> Dear Group,
>>> 
>>> i am to plan a very basic factorial greenhouse experiment, and this time i will first ask for statistical advise before execution :).
>>> 
>>> It will encompass two treatment types with two levels each, two sampling dates with five replicates each, resulting in 2 x 2 x 2 x 5 = 40 samples.
>>> 
>>> I guess, this is a basic three way ANOVA (~ Treatment A * Treatment B * Time). However, the arrangement of the replicates in the greenhouse will be randomized. I have only a limited understanding of mixed models, obviously, but does the randomized location of the plant also requires the introduction of a random effect (~ Treatment A * Treatment B * Time + (1|Location)?. How do i best code location in this case?
>>> 
>>> Thank you!
>>> 
>>> -- 
>>> Tim Richter-Heitmann
>>> 
>>> University of Bremen
>>> Microbial Ecophysiology Group (AG Friedrich)
>>> FB02 - Biologie/Chemie
>>> Leobener Straße (NW2 A2130)
>>> D-28359 Bremen
>>> Tel.: 0049(0)421 218-63062
>>> Fax: 0049(0)421 218-63069
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>> 
> 
> -- 
> Dr. Tim Richter-Heitmann
> 
> University of Bremen
> Microbial Ecophysiology Group (AG Friedrich)
> FB02 - Biologie/Chemie
> Leobener Straße (NW2 A2130)
> D-28359 Bremen
> Tel.: 0049(0)421 218-63062
> Fax: 0049(0)421 218-63069

	[[alternative HTML version deleted]]