[R-sig-eco] Question About Syntax For Complex ANOVA Design

Fri Nov 7 20:13:56 CET 2008

Hi Mike,

    Thanks for the response, hopefully we can get this thing figured 
out.  And I hope things will be easier than I think--that's usually a 
good thing!  However, I do wonder if I didn't explain myself fully (too 
often the case).  My apologies for not clearly stating the hypotheses.  
Basically, the question is 'does HSP differ between high and low 
intertidal locations, and does that relationship depend on the site, the 
coast, and the time of year?'  That can be broken up into a number of 
single-factor H0s. 

    I'm not a marine person, but I'm pretty sure that the high-low MBL 
difference is well understood, since mussels in the high intertidal are 
exposed more to temperature stress, so they should be producing more 
heat shock protein (HSP).  However, this relationship likely varies 
depending on a number of things that vary across sites (geology, tidal 
action, etc).  There is almost certainly a time signature in production 
related to climatic cycles, so that's why they sampled 3 times during 
the year.  So, I guess they're mostly interested in how the MBL 
difference changes across space and time. 

    So, yes, I do want to know if time trends are different at the 
different sites (sorry for not articulating that).  So, in that case, I 
should have 'time' in the random part of the equation, correct?

    As for the other terms, I am confused by the way you suggested 
including them in the model, especially with regards to their not being 
nested (in your formulation).  The sampling regime (MBL within site 
within coast) is nested, and so the data should be analyzed as such.  
 From the way I understand nestedness in R, I have to write it out as 
c/b/a where a is nested in b is nested in c.  So I guess I am confused 
as to if     lme(HSP~coast+MBL, random= ~time|site)  would account for 
that or not.  And I'm aware that it's kind of silly to worry about 
random vs fixed effects with few levels.  However, from my understanding 
of nested ANOVA, that makes a big difference in the way that the F 
ratios are calculated (i.e. what is used to construct the 
denominator).   Since site is random, MS coast should be compared to MS 
site, rather than MS residual.  Again, that's at least from my 
understanding of how it works in the old-school way, perhaps the REML 
fitting does it differently, and so it wouldn't matter? 

    Thanks!

--Joe

Mike Dunbar wrote:
> Hi Joe
>
> I think the command you want is probably simpler than you think:
>
>  lme(HSP~coast*MBL, random= ~1|site)     
> or
>  lme(HSP~coast+MBL, random= ~1|site)     
>
> coast and MBL have distinct levels so are fixed and site is random as you say.
> Having site as random will take into account that there are repeated measures through time at each point (MBL).
> Each site has two points (MBLs). lme will treat coast and MBL correctly providing that they are coded correctly. You can check this by running the analysis and looking at the degrees of freedom for the fixed effects. It's best not to thing to hard about the structure in terms of fixed/random/fixed: even though other stats packages might encourage this. Think of the random effects (including the error term)  providing the structure and the fixed effects slotting into that structure accordingly.
> If you write random = ~time|site then you are saying random slopes for the time fixed effect, i.e. there is an overall time trend and each site responds differently around that trend. I don't think this is what you want as you don't specifically mention time trends.
>
> Or if time of year is a factor, something like
> lme(HSP~coast+MBL+time, random= ~1|site)    
> But the problem here is that you may run out of replication to estimate any of the fixed effects, each combination of coast, MBL, site and time is unique. 
>
> Also BUT BUT, even if you allow the three time samples to be replicates:
> You are potentially going to have an issue if you only have four sites. This is not alot to estimate a random effect. One option is to treat site as fixed. There is an argument that site is indeed random, so it should be treated as random, in which case I'm not sure that lme (or the newer lmer) will handle the full uncertainty for small sample sizes correctly). To do that would need a more fully Bayesian approach. But I'm writing that from memory, I don't have the reference to hand. 
>
> Finally, it all depends on what the hypothesis you are trying to test: what's the hypothesis?
>
> regards
>
> Mike
>
>
>
>
>
>   
>>>> Joe Simonis <jls468 at cornell.edu> 07/11/2008 15:53 >>>
>>>>         
> Hey Everyone,
>
>     I'm helping a friend out with analyzing some of her data, and I 
> haven't run an ANOVA like this in a while, and especially not in R.  I'm 
> having a bit of trouble figuring out the correct syntax and so I was 
> hoping to get feedback.  Any input would be welcomed.  As of now, I also 
> don't have the data, but I've been told that sample size should be equal 
> for all of the combinations (although that may not be true).  In any 
> case, for now, let's assume all sample sizes are equal.
>
>     The basics of the mensurative experiment are as follows:
>
>     The study was looking at variation in physiological values (HSP) of 
> intertidal mussels across a few different sites at three different times 
> of year.  The sampling was done in New Zealand, with 2 sites sampled on 
> each of the East and West coasts, and within each site, there were two 
> sampling points (mussel bed location, MBL), one low in the intertidal 
> one high in the intertidal.  There are two levels of MBL (low and high) 
> at each site and there are two sites for each cost.  I see this as MBL 
> nested in site, nested in coast.  However, it seems to me that only site 
> is a random factor.  Both MBLs were picked specifically at that site and 
> were done so in a way to compare high to low locations, so that seems 
> fixed to me.  Site was picked more to look at site-to-site variation 
> (i.e random factor).  And coasts were explicitly being compared (i.e. 
> fixed factor). 
>
>     So, I see that as a fixed factor nested in a random factor  nested 
> in a fixed factor.  Does that make sense?  And then there's the bit 
> about repeated measures, since they sampled mussels from each MBL 3 
> times.  I don't think that necessarily complicates things too much, but 
> maybe it does?  I can put together the ANOVA table on paper in the way 
> I'd like to analyze the data (working off of examples in Quinn and 
> Keough on pg 314, but with an additional level of nestedness).  However, 
> I am pretty lost on how to code the syntax for the analysis in R.  I've 
> had a few different ideas, but none of them really seem correct to me.  
> I think the biggest problem for me is figuring out how to keep the 
> structure of the nestedness in tact despite the fact that some factors 
> are fixed and some are random. 
>
>     The best I could come up with so far is     lme(HSP~MBL, random= 
> ~time|coast/site)     
>
>     But it doesn't seem really right to me.  I have MBL in the fixed 
> part of the model, but when it's like that, I think the nestedness gets 
> lost (since there's no cost/site/MBL anywhere). 
>
>     Again, I would appreciate any insights into the syntax or the 
> general way I am approaching this analysis.  I've been trying to piece 
> the stuff together from nested analyses and time-series analyses in 
> Crawley's book, but I'm just not getting it.
>
>     Thanks a bunch in advance!!
>
> --Joe
>
>   

-- 
Joseph L. Simonis

Cornell University
Department of Ecology & Evolutionary Biology
E231 Corson Hall
Ithaca, NY 14853 USA
email:  jls468 at cornell.edu

http://www.people.cornell.edu/pages/jls468/