[R] Tukey Kramer with ANOVA (glm)

Thu Jun 14 18:04:03 CEST 2012

On Jun 13, 2012, at 7:36 PM, Alaska_Man wrote:

> Hello,
>
> I am performing a BACI analysis with ANOVA using the following glm:

I admit I had no idea what a "BACI analysis" might be. Looking it up  
it appears to be a cross-over design and my statistical betters have  
sternly warned me about this regression briar patch in the past. I'm  
especially suspicious of the lack of any statements about the balance  
in the sampling in your presentation. (And for that matter the  
extremely sketchy statement of design.)

>
> fit1<-glm(log(Cucs_m+1)~(BA*Otter)+BA+Otter+ID+Primary, data=b1)

I'm guessing you do not understand that BA*Otter in an R formula  
expands to BA + Otter + BA:Otter

> The summary(aov(fit1)) shows significance in the interaction;  
> however, now I
> would like to determine what combinations of BA and Otter are  
> significantly
> different (each factor has two levels).  ID and PRIMARY substrates are
> categorical and included in the model to help explain some of the  
> variation
> in the data.  The data is unbalanced so I plan on using Tukey Kramer  
> post
> hoc analysis.  Here is how my data is laid out, it is a fairly  
> substantial
> data set:

Editing done on original (although it proved unrevealing.)
> Subdistrict  T   Year  Cucs_m  Primary Persistence   Otter  
> Fishing    BA         ID
> 109-41,42    9  2010   0.00     sil           3      1        
> 1        A   109-41,42
> 109-41,42   13  2010   2.75     rck           3      1        
> 1        A   109-41,42
> 109-41,42   16  2010   2.00     rck           3      0        
> 1        A   109-41,42
> 109-41,42   18  2010   8.25     rck           3      0        
> 0        B   109-41,42
>
> I am assuming this is an appropriate pairwise comparison analysis  
> and I
> cannot get the code to work with my data.

What does it mean to be doing "pairwise comparisons" on two-level  
factor variables?)

>  I am *unclear how to code it to
> work with the interaction*; however, even when I attempt to use it  
> only for
> a single factor, it does not work (see below).
>
> x<-aov(glm(Cucs_m~as.factor(BA),data=cuc))
> glht(x, linfct=mcp(BA="Tukey"))
> ....................................
> Error in mcp2matrix(model, linfct = linfct) :
>  Variable(s) ‘BA’ have been specified in ‘linfct’ but cannot be  
> found in
> ‘model’!

I suspect the glht() function is looking for 'as.factor(BA)` in the  
model matrix and not finding it. If BA is not already a factor, then  
it would make sense to do:

cuc$BA <- factor(cuc$BA)

.... before any analysis. Notice that you get a warning that  
performing contrasts in the presence of interactions is something to  
be warned about. If you do not know what you are doing here (and your  
proposed analysis hints at that possibility), I may have set a trap  
for you by solving a syntactic problem but not solving a conceptual  
problem.

 > mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,  
tension %in% c("L","M")))
 > glht(mod, linfct=mcp(tension="Tukey"))

	 General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts

Linear Hypotheses:
            Estimate
M - L == 0  -0.6012

Warning message:
In mcp2matrix(model, linfct = linfct) :
   covariate interactions found -- default contrast might be  
inappropriate
---------------
Looking at the mod-object you see that the "Estimate" above is  
actually NOT what you had interest in. You were presumably more  
interested in the contrast woolB:tensionM  whose coefficient was 0.6281.
----

Coefficients:
    (Intercept)           woolB        tensionM  woolB:tensionM
         3.7179         -0.4356         -0.6012          0.6281
----------

I would have instead done something like this:

 > mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,  
tension %in% c("L","M")))
 > mod2 <- glm(log(breaks) ~ wool+tension, data=subset(warpbreaks,  
tension %in% c("L","M")))

 > anova(mod,mod2)
Analysis of Deviance Table

Model 1: log(breaks) ~ wool * tension
Model 2: log(breaks) ~ wool + tension
   Resid. Df Resid. Dev Df Deviance
1        32     4.6235
2        33     5.5113 -1 -0.88777

Now I can say that the addition of an interaction term resulted in a  
non-significant improvement in model fit at least when measured on the  
log(breaks) scale. (Note: This is quite a different result than one  
sees on the untransformed scale where the interaction is highly  
significant.) When your factors are both binary, the effect estimates  
fit nicely into a 2 x 2 table and the consideration of the single  
contrast added by the interaction is fairly simple.

                   wool=='A'                  wool=='B'
  tension=='L'      3.7179                -0.4356+3.7179
  tension=='M'     -0.6012+3.7179          0.6281+3.7179

> Can anyone off suggestions on potential problems with my approach  
> and/or
> script issues?

Why was the log transformation being done? Is the desired outcome a  
statement about ratios?

-- 

David Winsemius, MD
West Hartford, CT