[R] Tukey Kramer with ANOVA (glm)
David Winsemius
dwinsemius at comcast.net
Thu Jun 14 18:04:03 CEST 2012
On Jun 13, 2012, at 7:36 PM, Alaska_Man wrote:
> Hello,
>
> I am performing a BACI analysis with ANOVA using the following glm:
I admit I had no idea what a "BACI analysis" might be. Looking it up
it appears to be a cross-over design and my statistical betters have
sternly warned me about this regression briar patch in the past. I'm
especially suspicious of the lack of any statements about the balance
in the sampling in your presentation. (And for that matter the
extremely sketchy statement of design.)
>
> fit1<-glm(log(Cucs_m+1)~(BA*Otter)+BA+Otter+ID+Primary, data=b1)
I'm guessing you do not understand that BA*Otter in an R formula
expands to BA + Otter + BA:Otter
> The summary(aov(fit1)) shows significance in the interaction;
> however, now I
> would like to determine what combinations of BA and Otter are
> significantly
> different (each factor has two levels). ID and PRIMARY substrates are
> categorical and included in the model to help explain some of the
> variation
> in the data. The data is unbalanced so I plan on using Tukey Kramer
> post
> hoc analysis. Here is how my data is laid out, it is a fairly
> substantial
> data set:
Editing done on original (although it proved unrevealing.)
> Subdistrict T Year Cucs_m Primary Persistence Otter
> Fishing BA ID
> 109-41,42 9 2010 0.00 sil 3 1
> 1 A 109-41,42
> 109-41,42 13 2010 2.75 rck 3 1
> 1 A 109-41,42
> 109-41,42 16 2010 2.00 rck 3 0
> 1 A 109-41,42
> 109-41,42 18 2010 8.25 rck 3 0
> 0 B 109-41,42
>
> I am assuming this is an appropriate pairwise comparison analysis
> and I
> cannot get the code to work with my data.
What does it mean to be doing "pairwise comparisons" on two-level
factor variables?)
> I am *unclear how to code it to
> work with the interaction*; however, even when I attempt to use it
> only for
> a single factor, it does not work (see below).
>
> x<-aov(glm(Cucs_m~as.factor(BA),data=cuc))
> glht(x, linfct=mcp(BA="Tukey"))
> ....................................
> Error in mcp2matrix(model, linfct = linfct) :
> Variable(s) ‘BA’ have been specified in ‘linfct’ but cannot be
> found in
> ‘model’!
I suspect the glht() function is looking for 'as.factor(BA)` in the
model matrix and not finding it. If BA is not already a factor, then
it would make sense to do:
cuc$BA <- factor(cuc$BA)
.... before any analysis. Notice that you get a warning that
performing contrasts in the presence of interactions is something to
be warned about. If you do not know what you are doing here (and your
proposed analysis hints at that possibility), I may have set a trap
for you by solving a syntactic problem but not solving a conceptual
problem.
> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
tension %in% c("L","M")))
> glht(mod, linfct=mcp(tension="Tukey"))
General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Linear Hypotheses:
Estimate
M - L == 0 -0.6012
Warning message:
In mcp2matrix(model, linfct = linfct) :
covariate interactions found -- default contrast might be
inappropriate
---------------
Looking at the mod-object you see that the "Estimate" above is
actually NOT what you had interest in. You were presumably more
interested in the contrast woolB:tensionM whose coefficient was 0.6281.
----
Coefficients:
(Intercept) woolB tensionM woolB:tensionM
3.7179 -0.4356 -0.6012 0.6281
----------
I would have instead done something like this:
> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
tension %in% c("L","M")))
> mod2 <- glm(log(breaks) ~ wool+tension, data=subset(warpbreaks,
tension %in% c("L","M")))
> anova(mod,mod2)
Analysis of Deviance Table
Model 1: log(breaks) ~ wool * tension
Model 2: log(breaks) ~ wool + tension
Resid. Df Resid. Dev Df Deviance
1 32 4.6235
2 33 5.5113 -1 -0.88777
Now I can say that the addition of an interaction term resulted in a
non-significant improvement in model fit at least when measured on the
log(breaks) scale. (Note: This is quite a different result than one
sees on the untransformed scale where the interaction is highly
significant.) When your factors are both binary, the effect estimates
fit nicely into a 2 x 2 table and the consideration of the single
contrast added by the interaction is fairly simple.
wool=='A' wool=='B'
tension=='L' 3.7179 -0.4356+3.7179
tension=='M' -0.6012+3.7179 0.6281+3.7179
> Can anyone off suggestions on potential problems with my approach
> and/or
> script issues?
Why was the log transformation being done? Is the desired outcome a
statement about ratios?
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list