[R-sig-eco] Logistic regression with 2 categorical predictors

Andrew Halford andrew.halford at gmail.com
Mon Oct 27 01:51:20 CET 2014


Hi Jari,

I've only just seen your post and as you would now know there has been a
generous reply from Philip Dixon in the interim - the full details of which
will take me some time to digest. However I am writing to say thanks for
the response and yes you have guessed right that we had an ever declining
pool of fish to run the choice tests- larval fish are not particularly
robust unfortunately.

I am thinking that Philips suggestion for a series of multiple tests along
with the appropriate correction for doing so is the most palatable for me
at this point in time.

Andy

On 24 October 2014 16:54, Jari Oksanen <jari.oksanen at oulu.fi> wrote:

> Andrew,
>
> If the 24 rows are the data you are analysing, I cannot replicate any of
> your significant results within that glm framework *if* I take into account
> the overdispersion. The full model with age*test interaction is saturated
> and cannot be analysed at all, but the main effects model age+test can be
> analysed (or either term separately). However,  the results are
> overdispersed, and you should use family=quasibinomial instead of
> family=binomial, and then use test="F" instead of test="Chi":
>
> > anova(glm(cbind(prefer,avoid) ~ age+test, data=datafromthemail,
> family=quasibinomial), test="F")
> Analysis of Deviance Table
>
> Model: quasibinomial, link: logit
>
> Response: cbind(prefer, avoid)
>
> Terms added sequentially (first to last)
>
>
>      Df Deviance Resid. Df Resid. Dev      F Pr(>F)
> NULL                    23     54.908
> age   5  11.2352        18     43.673 0.9844 0.4591
> test  3   1.5934        15     42.079 0.2327 0.8722
>
> As you see, the Resid. Dev is much larger than Resid. Df for both terms in
> this sequential model, and therefore you must use quasibinomial models and
> F-tests -- and these give similar results as other tests.
>
> I could not get any results for the saturated models, and one of your
> examples (below in this thread) seemed to use only one level of test and
> only *six* observations: it was also saturated as you had no replicates for
> your six age levels. You need replicates.
>
> However, I'm not sure I understood your data correctly. It looks like you
> have a certain number of animals, but their number is reduced with age so
> that you have a kind of censored data (animals not available in all cases).
> Perhaps somebody can propose a better analysis for such a censored data, if
> it is like that.
>
> Cheers, Jari Oksanen
>
> On 24/10/2014, at 10:41 AM, Andrew Halford wrote:
>
> > Dear Gavin,
> >
> > Firstly let me say that I take offence at your "bogus" comment. Just
> > because I, like many others who interact on this list, often struggle
> > conceptually with the overwhelming analysis choices that are required in
> > our line of work doesn't give you the right to drop snide remarks as you
> > see fit.
> >
> > My line of query is ALWAYS genuine from my perspective and I don't expect
> > you or anyone else to belittle people on the public list!
> >
> > As it turns out my issues are not resolved.
> >
> > To recap..
> >
> > I have run a bunch of choice chamber experiments with larval fish.
> Graphing
> > up the ratio of 0/1 choices produces a plot which shows to my eye,
> evidence
> > of a result for some of the tests, with fish appearing to make defined
> > choices in the later age groups for 2 of the tests.
> >
> > What appears to be happening is that because there are some empty cells
> in
> > the later age x test interactions (the fish only took one option to the
> > exclusion of the other) the errors are way out and hence preclude any
> > chance of getting a significant result. If I add a single result to any
> of
> > the zero cells to remove the "blank" the analysis actually works more as
> I
> > hoped. However I doubt this is acceptable so I am hoping to get some help
> > with producing an effective analysis without having to manipulate the
> blank
> > cells.
> >
> > Andrew
> >
> >
> >
> >
> > On 24 October 2014 04:08, Gavin Simpson <ucfagls at gmail.com> wrote:
> >
> >> This all looks bogus to me; you've fit the data perfectly by fitting a
> >> saturated model - there are no residual degrees of freedom and
> >> (effectively) zero residual deviance. Things are clearly amiss because
> you
> >> have huge standard errors. You have 24 data points and fit a model with
> 23
> >> coefficient plus the intercept; you just replaced your data with 24 new
> >> data points (the values in the Estimate column of the summary() output)
> >>
> >> I really wouldn't bother interpreting it any further.
> >>
> >> HTH
> >>
> >> G
> >>
> >> On 21 October 2014 18:21, Andrew Halford <andrew.halford at gmail.com>
> wrote:
> >>
> >>> Hi Thierry,
> >>>
> >>> The multiple comparisons ran just fine but there was a ridiculous
> amount
> >>> of
> >>> interaction combinations all of which were non-significant even though
> >>> there was a highly significant interaction term. I decided to remove
> test
> >>> as a variable to simplify the analysis and run separate single
> explanatory
> >>> variable logistic regressions. I have included a result below which is
> >>> still producing an outcome I cant explain. Namely, why am I getting
> such a
> >>> significant result for the ANOVA but when I do the tukey tests nothing
> is
> >>> significant?
> >>>
> >>>> sg_habitat
> >>>  Age Prefer Avoid
> >>> 1   1     17    14
> >>> 2   2     20    10
> >>> 3   3     14     9
> >>> 4   4     13    12
> >>> 5   5      0    18
> >>> 6   6      0     5
> >>>
> >>>> model_sg <- glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat,
> >>> family=binomial)
> >>>
> >>>> anova(model_sg, test="Chisq")
> >>>
> >>> Analysis of Deviance Table
> >>>
> >>> Model: binomial, link: logit
> >>>
> >>> Response: cbind(Prefer, Avoid)
> >>>
> >>> Terms added sequentially (first to last)
> >>>
> >>>
> >>>     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
> >>> NULL                     5     36.588
> >>> Age   5   36.588         0      0.000 7.243e-07 ***
> >>>
> >>>
> >>>> mc_sg <- glht(model_sg, mcp(Age = "Tukey"))
> >>>
> >>>> summary(mc_sg)
> >>>
> >>>         Simultaneous Tests for General Linear Hypotheses
> >>>
> >>> Multiple Comparisons of Means: Tukey Contrasts
> >>>
> >>>
> >>> Fit: glm(formula = cbind(Prefer, Avoid) ~ Age, family = binomial,
> >>>    data = sg_habitat)
> >>>
> >>> Linear Hypotheses:
> >>>             Estimate Std. Error z value Pr(>|z|)
> >>> 2 - 1 == 0     0.4990     0.5294   0.943    0.912
> >>> 3 - 1 == 0     0.2477     0.5593   0.443    0.997
> >>> 4 - 1 == 0    -0.1141     0.5390  -0.212    1.000
> >>> 5 - 1 == 0   -25.8473 53178.5362   0.000    1.000
> >>> 6 - 1 == 0   -24.7307 57729.9299   0.000    1.000
> >>> 3 - 2 == 0    -0.2513     0.5767  -0.436    0.997
> >>> 4 - 2 == 0    -0.6131     0.5570  -1.101    0.844
> >>> 5 - 2 == 0   -26.3463 53178.5362   0.000    1.000
> >>> 6 - 2 == 0   -25.2296 57729.9299   0.000    1.000
> >>> 4 - 3 == 0    -0.3618     0.5855  -0.618    0.985
> >>> 5 - 3 == 0   -26.0950 53178.5362   0.000    1.000
> >>> 6 - 3 == 0   -24.9783 57729.9299   0.000    1.000
> >>> 5 - 4 == 0   -25.7332 53178.5362   0.000    1.000
> >>> 6 - 4 == 0   -24.6165 57729.9299   0.000    1.000
> >>> 6 - 5 == 0     1.1167 78490.1364   0.000    1.000
> >>> (Adjusted p values reported -- single-step method)
> >>>
> >>>
> >>> On 21 October 2014 22:53, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be>
> >>> wrote:
> >>>
> >>>> Hi Andrew,
> >>>>
> >>>> Please keep the mailing list in cc.
> >>>>
> >>>> The estimates in mc are the differences of the parameter estimates
> >>> (betas)
> >>>> between both levels. E.g. 5.LR -1.LR = -1.168 or 5.LR = 1.LR - 1.168
> >>>>
> >>>> summary(mc) should give you the significance of those differences.
> That
> >>>> should work. If it doesn't, please provide more info: at least your
> code
> >>>> and the error message. A small reproducible example is better.
> >>>> confint(mc) gives the confidence intervals of the differences.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Thierry
> >>>>
> >>>> ir. Thierry Onkelinx
> >>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> >>> and
> >>>> Forest
> >>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> >>>> Kliniekstraat 25
> >>>> 1070 Anderlecht
> >>>> Belgium
> >>>> + 32 2 525 02 51
> >>>> + 32 54 43 61 85
> >>>> Thierry.Onkelinx at inbo.be
> >>>> www.inbo.be
> >>>>
> >>>> To call in the statistician after the experiment is done may be no
> more
> >>>> than asking him to perform a post-mortem examination: he may be able
> to
> >>> say
> >>>> what the experiment died of.
> >>>> ~ Sir Ronald Aylmer Fisher
> >>>>
> >>>> The plural of anecdote is not data.
> >>>> ~ Roger Brinner
> >>>>
> >>>> The combination of some data and an aching desire for an answer does
> not
> >>>> ensure that a reasonable answer can be extracted from a given body of
> >>> data.
> >>>> ~ John Tukey
> >>>>
> >>>> Van: Andrew Halford [mailto:andrew.halford at gmail.com]
> >>>> Verzonden: dinsdag 21 oktober 2014 16:19
> >>>> Aan: ONKELINX, Thierry
> >>>> Onderwerp: Re: [R-sig-eco] Logistic regression with 2 categorical
> >>>> predictors
> >>>>
> >>>> Hi Thierry,
> >>>> Thanks for the response. I have run your code but it seems you cant
> call
> >>>> the summary function, you just have to call the object on its own i.e.
> >>> mc.
> >>>> The results are I get are below but I am not sure how to interpret
> >>> these,
> >>>> exactly what does the estimate represent? How do I measure
> significance
> >>>> here?
> >>>>
> >>>>                   Estimate
> >>>> 2.LR - 1.LR == 0  1.252e-01
> >>>> 3.LR - 1.LR == 0 -5.390e-01
> >>>> 4.LR - 1.LR == 0  1.802e-02
> >>>> 5.LR - 1.LR == 0 -1.168e+00
> >>>> 6.LR - 1.LR == 0 -2.575e+01
> >>>> 1.SD - 1.LR == 0  7.411e-02
> >>>> 2.SD - 1.LR == 0 -2.408e-01
> >>>> 3.SD - 1.LR == 0  2.675e-01
> >>>> etc etc
> >>>>
> >>>> Andy
> >>>>
> >>>> On 20 October 2014 23:04, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be
> >
> >>>> wrote:
> >>>> Dear Andrew,
> >>>>
> >>>> anova() and summary() test different hypotheses. anova() tests is at
> >>> least
> >>>> one level is different from the others. summary() tests if the
> >>> coefficient
> >>>> is different from zero.
> >>>>
> >>>> Multiple comparison of different interaction levels is probably the
> most
> >>>> relevant in this case. The easiest way is to make a new variable.
> >>>>
> >>>> snapper2$inter <- with(snapper2, interaction(age, test))
> >>>> model <- glm(cbind(prefer,avoid) ~ 0 + inter, data=snapper2,
> >>>> family=binomial)
> >>>> library(multcomp)
> >>>> mc <- glht(model, mcp(inter = "Tukey))
> >>>> summary(mc)
> >>>>
> >>>> Best regards,
> >>>>
> >>>> ir. Thierry Onkelinx
> >>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> >>> and
> >>>> Forest
> >>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> >>>> Kliniekstraat 25
> >>>> 1070 Anderlecht
> >>>> Belgium
> >>>> + 32 2 525 02 51
> >>>> + 32 54 43 61 85
> >>>> Thierry.Onkelinx at inbo.be
> >>>> www.inbo.be
> >>>>
> >>>> To call in the statistician after the experiment is done may be no
> more
> >>>> than asking him to perform a post-mortem examination: he may be able
> to
> >>> say
> >>>> what the experiment died of.
> >>>> ~ Sir Ronald Aylmer Fisher
> >>>>
> >>>> The plural of anecdote is not data.
> >>>> ~ Roger Brinner
> >>>>
> >>>> The combination of some data and an aching desire for an answer does
> not
> >>>> ensure that a reasonable answer can be extracted from a given body of
> >>> data.
> >>>> ~ John Tukey
> >>>>
> >>>>
> >>>> -----Oorspronkelijk bericht-----
> >>>> Van: r-sig-ecology-bounces at r-project.org [mailto:
> >>>> r-sig-ecology-bounces at r-project.org] Namens Andrew Halford
> >>>> Verzonden: maandag 20 oktober 2014 16:06
> >>>> Aan: r-sig-ecology at r-project.org
> >>>> Onderwerp: [R-sig-eco] Logistic regression with 2 categorical
> predictors
> >>>>
> >>>> Hi Listers,
> >>>>
> >>>> I am trying to run a logistic regression to look at the effects of
> >>>> experiment type and age on the behavior of fish in a choice chamber
> >>>> experiment.
> >>>>
> >>>> I am using the glm approach and would like some advice on how or
> whether
> >>>> to perform contrasts to work out what levels of Factor1 (Age) and
> >>> Factor 2
> >>>> (Test) are significantly different from each other. I have not been
> able
> >>>> to clarify from my reading what is the appropriate approach to take
> when
> >>>> dealing with a significant interaction term. I am also not sure as to
> >>> how
> >>>> one interprets a model when all the coefficients are non-significant
> but
> >>>> the chi-square ANOVA shows a highly significant interaction term.
> >>>>
> >>>> I have graphed up the data as dot plots and there is definitely
> evidence
> >>>> of changes in proportions in later ages.
> >>>>
> >>>> I want to provide evidence for when and for which tests there was a
> >>>> 'significant' change in behavior.
> >>>>
> >>>>> snapper2
> >>>>   age test prefer avoid
> >>>> 1    1   LR     15    14
> >>>> 2    1   SD     15    13
> >>>> 3    1   SG     17    14
> >>>> 4    1   SW     14    14
> >>>> 5    2   LR     17    14
> >>>> 6    2   SD     16    19
> >>>> 7    2   SG     20    10
> >>>> 8    2   SW     15    21
> >>>> 9    3   LR     10    16
> >>>> 10   3   SD     14    10
> >>>> 11   3   SG     14     9
> >>>> 12   3   SW     13    15
> >>>> 13   4   LR     12    11
> >>>> 14   4   SD     14    11
> >>>> 15   4   SG     13    12
> >>>> 16   4   SW     11    14
> >>>> 17   5   LR      4    12
> >>>> 18   5   SD      8     8
> >>>> 19   5   SG      0    18
> >>>> 20   5   SW     10     6
> >>>> 21   6   LR      0     6
> >>>> 22   6   SD      3     4
> >>>> 23   6   SG      0     5
> >>>> 24   6   SW      5     3
> >>>>
> >>>>>
> >>>>
> >>>>
> >>>
> dotplot(age~prefer/avoid,data=snapper2,group=snapper2$test,cex=1.5,pch=19,ylab="age",auto.key=list(space="right",title="Tests"))
> >>>>
> >>>>
> >>>>> out2 <- glm(cbind(prefer,avoid) ~ age*test, data=snapper2,
> >>>> family=binomial)
> >>>>
> >>>>> summary(out2)
> >>>>
> >>>> Call:
> >>>> glm(formula = cbind(prefer, avoid) ~ age * test, family = binomial,
> >>>>    data = snapper2)
> >>>>
> >>>> Deviance Residuals:
> >>>> [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> >>> 0
> >>>> 0
> >>>>
> >>>> Coefficients:
> >>>>              Estimate Std. Error z value Pr(>|z|)
> >>>> (Intercept)  6.899e-02  3.716e-01   0.186   0.8527
> >>>> age2         1.252e-01  5.180e-01   0.242   0.8091
> >>>> age3        -5.390e-01  5.483e-01  -0.983   0.3256
> >>>> age4         1.802e-02  5.589e-01   0.032   0.9743
> >>>> age5        -1.168e+00  6.866e-01  -1.701   0.0890 .
> >>>> age6        -2.575e+01  9.348e+04   0.000   0.9998
> >>>> testSD       7.411e-02  5.307e-01   0.140   0.8890
> >>>> testSG       1.252e-01  5.180e-01   0.242   0.8091
> >>>> testSW      -6.899e-02  5.301e-01  -0.130   0.8964
> >>>> age2:testSD -4.401e-01  7.260e-01  -0.606   0.5444
> >>>> age3:testSD  7.324e-01  7.846e-01   0.933   0.3506
> >>>> age4:testSD  8.004e-02  7.863e-01   0.102   0.9189
> >>>> age5:testSD  1.024e+00  9.301e-01   1.102   0.2707
> >>>> age6:testSD  2.532e+01  9.348e+04   0.000   0.9998
> >>>> age2:testSG  3.738e-01  7.407e-01   0.505   0.6138
> >>>> age3:testSG  7.867e-01  7.832e-01   1.004   0.3152
> >>>> age4:testSG -1.321e-01  7.764e-01  -0.170   0.8649
> >>>> age5:testSG -2.568e+01  8.768e+04   0.000   0.9998
> >>>> age6:testSG  2.121e-02  1.334e+05   0.000   1.0000
> >>>> age2:testSW -4.616e-01  7.249e-01  -0.637   0.5242
> >>>> age3:testSW  3.959e-01  7.662e-01   0.517   0.6054
> >>>> age4:testSW -2.592e-01  7.858e-01  -0.330   0.7415
> >>>> age5:testSW  1.678e+00  9.386e-01   1.788   0.0737 .
> >>>> age6:testSW  2.626e+01  9.348e+04   0.000   0.9998
> >>>> ---
> >>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>>
> >>>> (Dispersion parameter for binomial family taken to be 1)
> >>>>
> >>>>    Null deviance: 5.4908e+01  on 23  degrees of freedom Residual
> >>>> deviance: 2.6113e-10  on  0  degrees of freedom
> >>>> AIC: 122.73
> >>>>
> >>>> Number of Fisher Scoring iterations: 23
> >>>>
> >>>>
> >>>>> anova(out2, test="Chisq")
> >>>>
> >>>> Analysis of Deviance Table
> >>>>
> >>>> Model: binomial, link: logit
> >>>>
> >>>> Response: cbind(prefer, avoid)
> >>>>
> >>>> Terms added sequentially (first to last)
> >>>>
> >>>>
> >>>>         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
> >>>> NULL                        23     54.908
> >>>> age       5   11.235        18     43.673 0.0469115 *
> >>>> test      3    1.593        15     42.079 0.6608887
> >>>> age:test 15   42.079         0      0.000 0.0002185 ***
> >>>> ---
> >>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>>
> >>>> cheers
> >>>>
> >>>> Andy
> >>>>        [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-ecology mailing list
> >>>> R-sig-ecology at r-project.org
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >>>> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * *
> *
> >>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
> schrijver
> >>>> weer en binden het INBO onder geen enkel beding, zolang dit bericht
> niet
> >>>> bevestigd is door een geldig ondertekend document.
> >>>> The views expressed in this message and any annex are purely those of
> >>> the
> >>>> writer and may not be regarded as stating an official position of
> INBO,
> >>> as
> >>>> long as the message is not confirmed by a duly signed document.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Andrew Halford Ph.D
> >>>> Research Scientist (Kimberley Marine Parks)|  Adjunct Research
> Scientist
> >>>> (Curtin University)
> >>>> Dept. Parks and Wildlife
> >>>> Western Australia
> >>>>
> >>>> Ph: +61 8 9219 9795
> >>>> Mobile: +61 (0) 468 419 473
> >>>> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * *
> *
> >>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
> schrijver
> >>>> weer en binden het INBO onder geen enkel beding, zolang dit bericht
> niet
> >>>> bevestigd is door een geldig ondertekend document.
> >>>> The views expressed in this message and any annex are purely those of
> >>> the
> >>>> writer and may not be regarded as stating an official position of
> INBO,
> >>> as
> >>>> long as the message is not confirmed by a duly signed document.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Andrew Halford Ph.D
> >>> Research Scientist (Kimberley Marine Parks)|  Adjunct Research
> Scientist
> >>> (Curtin University)
> >>> Dept. Parks and Wildlife
> >>> Western Australia
> >>>
> >>> Ph: +61 8 9219 9795
> >>> Mobile: +61 (0) 468 419 473
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-ecology mailing list
> >>> R-sig-ecology at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >>>
> >>
> >>
> >>
> >> --
> >> Gavin Simpson, PhD
> >>
> >
> >
> >
> > --
> > Andrew Halford Ph.D
> > Research Scientist (Kimberley Marine Parks)|  Adjunct Research Scientist
> > (Curtin University)
> > Dept. Parks and Wildlife
> > Western Australia
> >
> > Ph: +61 8 9219 9795
> > Mobile: +61 (0) 468 419 473
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>


-- 
Andrew Halford Ph.D
Research Scientist (Kimberley Marine Parks)|  Adjunct Research Scientist
(Curtin University)
Dept. Parks and Wildlife
Western Australia

Ph: +61 8 9219 9795
Mobile: +61 (0) 468 419 473

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list