[R-sig-eco] Logistic regression with 2 categorical predictors
Andrew Halford
andrew.halford at gmail.com
Wed Oct 22 02:21:49 CEST 2014
Hi Thierry,
The multiple comparisons ran just fine but there was a ridiculous amount of
interaction combinations all of which were non-significant even though
there was a highly significant interaction term. I decided to remove test
as a variable to simplify the analysis and run separate single explanatory
variable logistic regressions. I have included a result below which is
still producing an outcome I cant explain. Namely, why am I getting such a
significant result for the ANOVA but when I do the tukey tests nothing is
significant?
> sg_habitat
Age Prefer Avoid
1 1 17 14
2 2 20 10
3 3 14 9
4 4 13 12
5 5 0 18
6 6 0 5
> model_sg <- glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat,
family=binomial)
> anova(model_sg, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(Prefer, Avoid)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5 36.588
Age 5 36.588 0 0.000 7.243e-07 ***
> mc_sg <- glht(model_sg, mcp(Age = "Tukey"))
> summary(mc_sg)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = cbind(Prefer, Avoid) ~ Age, family = binomial,
data = sg_habitat)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
2 - 1 == 0 0.4990 0.5294 0.943 0.912
3 - 1 == 0 0.2477 0.5593 0.443 0.997
4 - 1 == 0 -0.1141 0.5390 -0.212 1.000
5 - 1 == 0 -25.8473 53178.5362 0.000 1.000
6 - 1 == 0 -24.7307 57729.9299 0.000 1.000
3 - 2 == 0 -0.2513 0.5767 -0.436 0.997
4 - 2 == 0 -0.6131 0.5570 -1.101 0.844
5 - 2 == 0 -26.3463 53178.5362 0.000 1.000
6 - 2 == 0 -25.2296 57729.9299 0.000 1.000
4 - 3 == 0 -0.3618 0.5855 -0.618 0.985
5 - 3 == 0 -26.0950 53178.5362 0.000 1.000
6 - 3 == 0 -24.9783 57729.9299 0.000 1.000
5 - 4 == 0 -25.7332 53178.5362 0.000 1.000
6 - 4 == 0 -24.6165 57729.9299 0.000 1.000
6 - 5 == 0 1.1167 78490.1364 0.000 1.000
(Adjusted p values reported -- single-step method)
On 21 October 2014 22:53, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be>
wrote:
> Hi Andrew,
>
> Please keep the mailing list in cc.
>
> The estimates in mc are the differences of the parameter estimates (betas)
> between both levels. E.g. 5.LR -1.LR = -1.168 or 5.LR = 1.LR - 1.168
>
> summary(mc) should give you the significance of those differences. That
> should work. If it doesn't, please provide more info: at least your code
> and the error message. A small reproducible example is better.
> confint(mc) gives the confidence intervals of the differences.
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> Van: Andrew Halford [mailto:andrew.halford at gmail.com]
> Verzonden: dinsdag 21 oktober 2014 16:19
> Aan: ONKELINX, Thierry
> Onderwerp: Re: [R-sig-eco] Logistic regression with 2 categorical
> predictors
>
> Hi Thierry,
> Thanks for the response. I have run your code but it seems you cant call
> the summary function, you just have to call the object on its own i.e. mc.
> The results are I get are below but I am not sure how to interpret these,
> exactly what does the estimate represent? How do I measure significance
> here?
>
> Estimate
> 2.LR - 1.LR == 0 1.252e-01
> 3.LR - 1.LR == 0 -5.390e-01
> 4.LR - 1.LR == 0 1.802e-02
> 5.LR - 1.LR == 0 -1.168e+00
> 6.LR - 1.LR == 0 -2.575e+01
> 1.SD - 1.LR == 0 7.411e-02
> 2.SD - 1.LR == 0 -2.408e-01
> 3.SD - 1.LR == 0 2.675e-01
> etc etc
>
> Andy
>
> On 20 October 2014 23:04, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be>
> wrote:
> Dear Andrew,
>
> anova() and summary() test different hypotheses. anova() tests is at least
> one level is different from the others. summary() tests if the coefficient
> is different from zero.
>
> Multiple comparison of different interaction levels is probably the most
> relevant in this case. The easiest way is to make a new variable.
>
> snapper2$inter <- with(snapper2, interaction(age, test))
> model <- glm(cbind(prefer,avoid) ~ 0 + inter, data=snapper2,
> family=binomial)
> library(multcomp)
> mc <- glht(model, mcp(inter = "Tukey))
> summary(mc)
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
>
> -----Oorspronkelijk bericht-----
> Van: r-sig-ecology-bounces at r-project.org [mailto:
> r-sig-ecology-bounces at r-project.org] Namens Andrew Halford
> Verzonden: maandag 20 oktober 2014 16:06
> Aan: r-sig-ecology at r-project.org
> Onderwerp: [R-sig-eco] Logistic regression with 2 categorical predictors
>
> Hi Listers,
>
> I am trying to run a logistic regression to look at the effects of
> experiment type and age on the behavior of fish in a choice chamber
> experiment.
>
> I am using the glm approach and would like some advice on how or whether
> to perform contrasts to work out what levels of Factor1 (Age) and Factor 2
> (Test) are significantly different from each other. I have not been able
> to clarify from my reading what is the appropriate approach to take when
> dealing with a significant interaction term. I am also not sure as to how
> one interprets a model when all the coefficients are non-significant but
> the chi-square ANOVA shows a highly significant interaction term.
>
> I have graphed up the data as dot plots and there is definitely evidence
> of changes in proportions in later ages.
>
> I want to provide evidence for when and for which tests there was a
> 'significant' change in behavior.
>
> > snapper2
> age test prefer avoid
> 1 1 LR 15 14
> 2 1 SD 15 13
> 3 1 SG 17 14
> 4 1 SW 14 14
> 5 2 LR 17 14
> 6 2 SD 16 19
> 7 2 SG 20 10
> 8 2 SW 15 21
> 9 3 LR 10 16
> 10 3 SD 14 10
> 11 3 SG 14 9
> 12 3 SW 13 15
> 13 4 LR 12 11
> 14 4 SD 14 11
> 15 4 SG 13 12
> 16 4 SW 11 14
> 17 5 LR 4 12
> 18 5 SD 8 8
> 19 5 SG 0 18
> 20 5 SW 10 6
> 21 6 LR 0 6
> 22 6 SD 3 4
> 23 6 SG 0 5
> 24 6 SW 5 3
>
> >
>
> dotplot(age~prefer/avoid,data=snapper2,group=snapper2$test,cex=1.5,pch=19,ylab="age",auto.key=list(space="right",title="Tests"))
>
>
> > out2 <- glm(cbind(prefer,avoid) ~ age*test, data=snapper2,
> family=binomial)
>
> > summary(out2)
>
> Call:
> glm(formula = cbind(prefer, avoid) ~ age * test, family = binomial,
> data = snapper2)
>
> Deviance Residuals:
> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 6.899e-02 3.716e-01 0.186 0.8527
> age2 1.252e-01 5.180e-01 0.242 0.8091
> age3 -5.390e-01 5.483e-01 -0.983 0.3256
> age4 1.802e-02 5.589e-01 0.032 0.9743
> age5 -1.168e+00 6.866e-01 -1.701 0.0890 .
> age6 -2.575e+01 9.348e+04 0.000 0.9998
> testSD 7.411e-02 5.307e-01 0.140 0.8890
> testSG 1.252e-01 5.180e-01 0.242 0.8091
> testSW -6.899e-02 5.301e-01 -0.130 0.8964
> age2:testSD -4.401e-01 7.260e-01 -0.606 0.5444
> age3:testSD 7.324e-01 7.846e-01 0.933 0.3506
> age4:testSD 8.004e-02 7.863e-01 0.102 0.9189
> age5:testSD 1.024e+00 9.301e-01 1.102 0.2707
> age6:testSD 2.532e+01 9.348e+04 0.000 0.9998
> age2:testSG 3.738e-01 7.407e-01 0.505 0.6138
> age3:testSG 7.867e-01 7.832e-01 1.004 0.3152
> age4:testSG -1.321e-01 7.764e-01 -0.170 0.8649
> age5:testSG -2.568e+01 8.768e+04 0.000 0.9998
> age6:testSG 2.121e-02 1.334e+05 0.000 1.0000
> age2:testSW -4.616e-01 7.249e-01 -0.637 0.5242
> age3:testSW 3.959e-01 7.662e-01 0.517 0.6054
> age4:testSW -2.592e-01 7.858e-01 -0.330 0.7415
> age5:testSW 1.678e+00 9.386e-01 1.788 0.0737 .
> age6:testSW 2.626e+01 9.348e+04 0.000 0.9998
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 5.4908e+01 on 23 degrees of freedom Residual
> deviance: 2.6113e-10 on 0 degrees of freedom
> AIC: 122.73
>
> Number of Fisher Scoring iterations: 23
>
>
> > anova(out2, test="Chisq")
>
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: cbind(prefer, avoid)
>
> Terms added sequentially (first to last)
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
> NULL 23 54.908
> age 5 11.235 18 43.673 0.0469115 *
> test 3 1.593 15 42.079 0.6608887
> age:test 15 42.079 0 0.000 0.0002185 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> cheers
>
> Andy
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
> weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
> bevestigd is door een geldig ondertekend document.
> The views expressed in this message and any annex are purely those of the
> writer and may not be regarded as stating an official position of INBO, as
> long as the message is not confirmed by a duly signed document.
>
>
>
> --
> Andrew Halford Ph.D
> Research Scientist (Kimberley Marine Parks)| Adjunct Research Scientist
> (Curtin University)
> Dept. Parks and Wildlife
> Western Australia
>
> Ph: +61 8 9219 9795
> Mobile: +61 (0) 468 419 473
> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
> weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
> bevestigd is door een geldig ondertekend document.
> The views expressed in this message and any annex are purely those of the
> writer and may not be regarded as stating an official position of INBO, as
> long as the message is not confirmed by a duly signed document.
>
--
Andrew Halford Ph.D
Research Scientist (Kimberley Marine Parks)| Adjunct Research Scientist
(Curtin University)
Dept. Parks and Wildlife
Western Australia
Ph: +61 8 9219 9795
Mobile: +61 (0) 468 419 473
[[alternative HTML version deleted]]
More information about the R-sig-ecology
mailing list