[R] Tukey Kramer with ANOVA (glm)
David Winsemius
dwinsemius at comcast.net
Fri Jun 15 06:03:03 CEST 2012
On Jun 14, 2012, at 2:17 PM, Alaska_Man wrote:
>
> Dr. Winsemius,
> Really quick, a BACI is a Before-After-Control-Impact approach. I
> have a long time series of sea cucumber density estimates, which are
> taken at the same location(s) through time. Some are in areas
> Impacted by sea otters and some are in areas Not Impacted by sea
> otters (two levels). Each estimate is also coded with a "B" if it
> is Before the time sea otters showed up at the impacted sites or "A"
> if after (Impact and Control sites are both coded with BA).
So you probably need a mixed-effects analysis because you have
repeated measures (how many?) in the same location. (How many
locations?)
> BACI analyses suggest an impact if the ANOVA interaction term
> (BA*Otter) is significant; i.e., changes in sea cucumber density
> from before to after depend on whether sea otters are present. I
> log transformed the response to help normalize the data, as it has
> many zeros.
I'm not sure that makes good sense. I don't think those are structural
zeros. Knowing how difficult it is to find sea critters, I think you
still have a significant possibility that one or more sea cumcumbers
was missed in those sites with measured zero values. For one thing the
log of 0 is not a well defined value. For another thing I think it
inflates the impact of small numbers on the inferential statistic, And
for a third thing, the interpretation of effects gets all confused.
You are not really interested in a ratio effect measure, at least I
wouldn't. Far preferable would be to use some type of robust
statistic to handle the inference issues and keep the estimates on a
linear scale, perhaps using bootstrap methods. Davison and Hinkley
have pre-post designs in their "practicals".
> While shapiro-wilks does not suggest normality, it is a very large
> data set and it is "approximately" normal based on graphical
> examination. Again, the data is unbalanced, as there are many more
> estimates at the control sites and before period.
So you should not be using aov, since that method assumes balance.
Regression methods should be appropriate.
> With that said... I would like to perform the following pairwise
> comparisons; B:With Otters v. A:With Otters, B:Without Otters v.
> A:Without Otters
> I am performing other ANOVAs with different data and no interaction,
> where I would like to perform multiple pairwise comparisons between
> the fivelevels of a single factor. I used the code I provided
> previously and still received error messages. If I can get this
> BACI interaction problem figured out, I should manage to adjust it
> to other models. I recently came accross Dunnett's Modified Tukey
> Kramer (DTK.test) and it appears to address the same issue of
> unbalanced data and has a very straightforward script (although I am
> not sure it lends itself to interactions?). Is this test an
> appropriate substitute for the glht method?
> You wrote, "When your factors are both binary, the effect estimates
> fit nicely into a 2 x 2 table and the consideration of the single
> contrast added by the interaction is fairly
> simple" wool=='A'
> wool=='B' tension=='L' 3.7179
> -0.4356+3.7179 tension=='M' -0.6012+3.7179
> 0.6281+3.7179
> This output seems to be what I am looking for; assuming that if the
> value range for a comparison includes zero, then there is no
> significant difference?
Those are the predicted levels of log(breaks) at various combinations
of wool and tension. You can pretty much always create such a table
from the coefficients in a linear model. (Since you used glm() without
a family argument you got a linear link.)
> Where did those values come from?
I just read them off the output of print(model) and added the
appropriate contrasts to the baseline "Intercept" which applies to the
wool=="A" and tension=="L" category . With R's default treatment
contrasts, all coefficients are referenced to the Intercept, and so
you need to add back each of the coefficients to get estimates for the
separate groups.
> I hope this helps clear up my problem. If you have concerns about
> pitfalls with this approach, then I would love to hear them and I
> can research them outside of this thread.
I would think this should be discussed with your advisor. If s/he
thinks its appropriate to get further Internet-mediated advice, then
you should go either to stats.stackexchange or the R-SIG-ME mailing
list where they have better minds than mine to bring to bear on
designs that are hierarchal.
> This is part of a masters thesis and needs to be sound.
> Thank you very much for your time.
> Sean
>
> Date: Thu, 14 Jun 2012 10:26:58 -0700
> From: ml-node+s789695n4633417h88 at n4.nabble.com
> To: seanlarson5 at hotmail.com
> Subject: Re: Tukey Kramer with ANOVA (glm)
>
>
>
>
> On Jun 13, 2012, at 7:36 PM, Alaska_Man wrote:
>
>
>> Hello,
>
>>
>
>> I am performing a BACI analysis with ANOVA using the following glm:
>
>
> I admit I had no idea what a "BACI analysis" might be. Looking it up
>
> it appears to be a cross-over design and my statistical betters have
>
> sternly warned me about this regression briar patch in the past. I'm
>
> especially suspicious of the lack of any statements about the balance
>
> in the sampling in your presentation. (And for that matter the
>
> extremely sketchy statement of design.)
>
>
>>
>
>> fit1<-glm(log(Cucs_m+1)~(BA*Otter)+BA+Otter+ID+Primary, data=b1)
>
>
> I'm guessing you do not understand that BA*Otter in an R formula
>
> expands to BA + Otter + BA:Otter
>
>
>> The summary(aov(fit1)) shows significance in the interaction;
>
>> however, now I
>
>> would like to determine what combinations of BA and Otter are
>
>> significantly
>
>> different (each factor has two levels). ID and PRIMARY substrates
>> are
>
>> categorical and included in the model to help explain some of the
>
>> variation
>
>> in the data. The data is unbalanced so I plan on using Tukey Kramer
>
>> post
>
>> hoc analysis. Here is how my data is laid out, it is a fairly
>
>> substantial
>
>> data set:
>
> Editing done on original (although it proved unrevealing.)
>
>> Subdistrict T Year Cucs_m Primary Persistence Otter
>
>> Fishing BA ID
>
>> 109-41,42 9 2010 0.00 sil 3 1
>
>> 1 A 109-41,42
>
>> 109-41,42 13 2010 2.75 rck 3 1
>
>> 1 A 109-41,42
>
>> 109-41,42 16 2010 2.00 rck 3 0
>
>> 1 A 109-41,42
>
>> 109-41,42 18 2010 8.25 rck 3 0
>
>> 0 B 109-41,42
>
>>
>
>> I am assuming this is an appropriate pairwise comparison analysis
>
>> and I
>
>> cannot get the code to work with my data.
>
> What does it mean to be doing "pairwise comparisons" on two-level
>
> factor variables?)
>
>
>> I am *unclear how to code it to
>
>> work with the interaction*; however, even when I attempt to use it
>
>> only for
>
>> a single factor, it does not work (see below).
>
>>
>
>> x<-aov(glm(Cucs_m~as.factor(BA),data=cuc))
>
>> glht(x, linfct=mcp(BA="Tukey"))
>
>> ....................................
>
>> Error in mcp2matrix(model, linfct = linfct) :
>
>> Variable(s) ‘BA’ have been specified in ‘linfct’ but cannot be
>
>> found in
>
>> ‘model’!
>
> I suspect the glht() function is looking for 'as.factor(BA)` in the
>
> model matrix and not finding it. If BA is not already a factor, then
>
> it would make sense to do:
>
>
> cuc$BA <- factor(cuc$BA)
>
>
> .... before any analysis. Notice that you get a warning that
>
> performing contrasts in the presence of interactions is something to
>
> be warned about. If you do not know what you are doing here (and your
>
> proposed analysis hints at that possibility), I may have set a trap
>
> for you by solving a syntactic problem but not solving a conceptual
>
> problem.
>
>
>> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>> glht(mod, linfct=mcp(tension="Tukey"))
>
>
> General Linear Hypotheses
>
> Multiple Comparisons of Means: Tukey Contrasts
>
>
> Linear Hypotheses:
>
> Estimate
>
> M - L == 0 -0.6012
>
>
> Warning message:
>
> In mcp2matrix(model, linfct = linfct) :
>
> covariate interactions found -- default contrast might be
>
> inappropriate
>
> ---------------
>
> Looking at the mod-object you see that the "Estimate" above is
>
> actually NOT what you had interest in. You were presumably more
>
> interested in the contrast woolB:tensionM whose coefficient was
> 0.6281.
>
> ----
>
>
> Coefficients:
>
> (Intercept) woolB tensionM woolB:tensionM
>
> 3.7179 -0.4356 -0.6012 0.6281
>
> ----------
>
>
> I would have instead done something like this:
>
>
>> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>> mod2 <- glm(log(breaks) ~ wool+tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>
>> anova(mod,mod2)
>
> Analysis of Deviance Table
>
>
> Model 1: log(breaks) ~ wool * tension
>
> Model 2: log(breaks) ~ wool + tension
>
> Resid. Df Resid. Dev Df Deviance
>
> 1 32 4.6235
>
> 2 33 5.5113 -1 -0.88777
>
>
> Now I can say that the addition of an interaction term resulted in a
>
> non-significant improvement in model fit at least when measured on the
>
> log(breaks) scale. (Note: This is quite a different result than one
>
> sees on the untransformed scale where the interaction is highly
>
> significant.) When your factors are both binary, the effect estimates
>
> fit nicely into a 2 x 2 table and the consideration of the single
>
> contrast added by the interaction is fairly simple.
>
>
> wool=='A' wool=='B'
>
> tension=='L' 3.7179 -0.4356+3.7179
>
> tension=='M' -0.6012+3.7179 0.6281+3.7179
>
>
>> Can anyone off suggestions on potential problems with my approach
>
>> and/or
>
>> script issues?
>
>
> Why was the log transformation being done? Is the desired outcome a
>
> statement about ratios?
>
>
> --
>
>
> David Winsemius, MD
>
> West Hartford, CT
>
>
> ______________________________________________
>
> [hidden email] mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
>
>
>
>
> If you reply to this email, your message will be added to the
> discussion below:
> http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633417.html
>
>
>
> To unsubscribe from Tukey Kramer with ANOVA (glm), click here.
>
> NAML
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633435.html
> Sent from the R help mailing list archive at Nabble.com.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list