[R-sig-ME] Fwd: model check for negative binomial model

Tue Feb 18 22:25:47 CET 2020

Dear Salvador,

Thank you for the clarification, that was very helpful.

Best wishes

Alessandra

On Mon, Feb 17, 2020 at 9:27 PM Salvador SANCHEZ COLON <
salvadorsanchezcolon using prodigy.net.mx> wrote:

> CORRECTION
>
>
> Dear Alessandra,
>
>
> My apologies, I msread the output you pasted in your previous messages. I
> did not realize that the parameter estimates that you were providing
> correspond to a negative binomial model, I assumed that they came from a
> binomial (logit) model. As Profr. Bolker mentioned, you cannot fit a
> negative binomial model to your data, it has to be a binomial (logit model)
> in which you model the number of eggs that hatch out of the total number of
> eggs in the nest. What the model predicts (after back transforming the odds
> ratio) is the probability, p, of an egg hatching, which can then be
> translated into the expected number of eggs that hatch simply by
> multiplying it by the number of eggs in the nest, np.
>
>
> I hope this helps and, again, my apologies for the confusion and for
> messing things yp.
>
>
> Salvador
>
>
> En Lun, 17 Febrero, 2020 en 20:46, yo <salvadorsanchezcolon using prodigy.net.mx>
> escribió:
>
> Para: bielli.alessandra using gmail.com
> Cc: bbolker using gmail.com; r-sig-mixed-models using r-project.org
>
> Cara Alessandra,
>
>
> If I may interject into your conversation, the key to your question lies
> in the parameter estimates:for the fixed effects:
>
>
>                                Estimate           Std. Error             z
> value        Pr(>|z|)
> (Intercept)               -1.38864             0.08227
> -16.879      < 2e-16 ***
> Relocation..Y.N.Y       0.32105              0.09152                3.508
>      0.000452 ***
> SPL                          0.22218               0.08793
> 2.527       0.011508 *
>
>
> As I understand your design, you have two independent variables (factors):
> Species and treatment, each with two levels: L vs. G and Y vs N.
>
>
> Thus, the logit model that you obtain (omitting the random effects) is:
>
>
> logit(p) = log(p/(1-p) = -1.38864 + 0.32105*Treatment + 0.22218*Species
>
> Treatment and species are two-level factors which are coded as 0 or 1 and
> the model has to have parameter estimates for each of those levels. By
> design, in generalized linear models parameters are estimated by taking one
> of the levels of each factor as the reference level and assigned a value of
> 0, and the parameters for the other levels are expressed in relation to the
> reference level. Thus, in your case, treatment N and species G are
> designated as the reference levels for their corresponding factors and,
> hence, their parameter values are both 0 (and not listed in the output).
> Then, treatment Y has a positive effect (0.32) on the odds ratio of
> hatching/not-hatching compared to treatment N; and species L also has a
> positive effect (0.22 times) on the odds ratio of hatching vs not-hatching.
>
>
> The intercept then (-1.38864) denotes the odds ratio for the combination
> of treatment N and species G; that is, when both factors are at their
> reference level of 0. This translates into a probability of hatching (p):
>
>
> p =  exp(-1.38864)/(1+exp(-1.38864) = 0.1996
>
>
> For treatment Y and species G, the model becomes:
>
>
> p =  exp(-1.38864 + 0.32105)/(1+exp(-1.38864 + 0.32105) = 0.25569
>
>
> as treatment Y has a positive effect on the odds ratio of hatching.
>
>
> For treatment N and species L, the model becomes:
>
>
> p =  exp(-1.38864 + 0.22218)/(1+exp(-1.38864 + 0.22218) = 0.2375
>
>
> as species L has a positive effect on the odds ratio of hatching.
>
>
> Finally, for the combination of treament Y and species L, the model
> becomes:
>
>
> p =  exp(-1.38864 + 0.32105 + 0.22218)/(1+exp(-1.38864 + 0.32105 + 0.22218)
> = 0.3002
>
>
> as both levels Y and L have positive effects on the odds ratio of hatching.
>
>
> I hope this helps.
>
>
> Best regards,
>
>
> Salvador
>
>
>
>
>
>
> En Lun, 17 Febrero, 2020 en 18:2, Alessandra Bielli <
> bielli.alessandra using gmail.com> escribió:
>
> Para: Ben Bolker
> Cc: r-sig-mixed-models using r-project.org
> Dear Ben
>
> I am trying to make a prediction for the combination of species (L or G)
> and treatment (control/experiment).
>
> I am still confused about the prediction values. I would like to present
> results as a success rate for a nest, to say that treatment
> increases/decreases success by ...%. But the value I have is the
> probability that 1 egg in the nest succeeds, correct? I am not sure how to
> use these predictions.
>
> Thanks for your help!
>
> Alessandra
>
> On Mon, Feb 17, 2020 at 2:15 PM Ben Bolker <bbolker using gmail.com> wrote:
>
> >
> > That's correct. There are some delicate issues about prediction:
> >
> > * do you want to use the original (potentially unbalanced) data for
> > prediction? (That's what you're doing here).
> > * or, do you want to make predictions for a "typical" nest and week
> > combination, in which case you would use
> >
> >
> > pframe <- with(your_data,
> > expand.grid(Relocation..Y.N.=unique(Relocation..Y.N.),
> > SP=unique(SP))
> > predict(m.unhatched,type="response",re.form=NA,newdata=pframe))
> >
> > You could also use expand.grid() to generate a balanced design (i.e.
> > all combinations of weeks and nests), which would give yet another
> answer.
> >
> > There are a lot of packages designed for doing these kinds of
> > post-fitting manipulations (e.g. 'margins', 'emmeans'), you might find
> > them useful ...
> >
> >
> >
> > On 2020-02-17 1:48 p.m., Alessandra Bielli wrote:
> > > Dear Thierry
> > >
> > > Thanks for your reply.
> > >
> > > I read a bit about the prediction for a binomial model with
> > > success/failures and I have a couple of questions.
> > >
> > > If I use the predict function with the model you recommended, I obtain
> > > log.odds or probabilities if I use "type=response":
> > >
> > >
> >
> tapply(predict(m.unhatched,type="response"),list(main$SP,main$Relocation..Y.N.),mean)
> > > N Y
> > > G 0.7314196 0.6414554
> > > L 0.6983576 0.6003087
> > >
> > > Are these probabilities of success (i.e. hatched) in one nest?
> > >
> > > Thanks,
> > >
> > > Alessandra
> > >
> > > On Mon, Feb 17, 2020 at 7:18 AM Thierry Onkelinx
> > > <thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>> wrote:
> > >
> > > Dear Alessandra,
> > >
> > > Since you have both the number hatched and the total clutch size you
> > > can calculate the number of successes and failures. That is
> > > sufficient for a binomial distribution.
> > >
> > > glmer(cbind(Hatched, Unhatched) ~ Relocation..Y.N. + SP + (1 |
> > > Beach_ID) + (1 | Week), family = binmial)
> > >
> > > A negative binomial or Poisson allow predictions larger than the
> > > offset. Which is nonsense given that the number hatched cannot
> > > surpass the total clutch size.
> > >
> > > Best regards,
> > >
> > > ir. Thierry Onkelinx
> > > Statisticus / Statistician
> > >
> > > Vlaamse Overheid / Government of Flanders
> > > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
> > > NATURE AND FOREST
> > > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > > thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> > > Havenlaan 88 bus 73, 1000 Brussel
> > > www.inbo.be <http://www.inbo.be>
> > >
> > >
> >
> ///////////////////////////////////////////////////////////////////////////////////////////
> > > To call in the statistician after the experiment is done may be no
> > > more than asking him to perform a post-mortem examination: he may be
> > > able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > > The plural of anecdote is not data. ~ Roger Brinner
> > > The combination of some data and an aching desire for an answer does
> > > not ensure that a reasonable answer can be extracted from a given
> > > body of data. ~ John Tukey
> > >
> >
> ///////////////////////////////////////////////////////////////////////////////////////////
> > >
> > > <https://www.inbo.be>
> > >
> > >
> > > Op wo 12 feb. 2020 om 18:42 schreef Alessandra Bielli
> > > <bielli.alessandra using gmail.com <mailto:bielli.alessandra using gmail.com>>:
> > >
> > > Dear Ben
> > >
> > > Thanks for your quick response.
> > >
> > > Yes, emergence success is usually between 60 and 80% or higher.
> > > I am not sure how to use a binomial, if my data are counts?
> > >
> > > Can you explain why the approximation doesn't work well if
> > > success gets
> > > much above 50%? Does it make sense, then, to have "unhatched" as
> > > dependent
> > > variable, so that I predict mortality (usually below 50%) using
> > > a nb with
> > > offset(log(total clutch)) ?
> > >
> > > > summary(m.emerged)
> > > Generalized linear mixed model fit by maximum likelihood (Laplace
> > > Approximation) ['glmerMod']
> > > Family: Negative Binomial(2.2104) ( log )
> > > Formula: Unhatched ~ Relocation..Y.N. + SP +
> > > offset(log(Total_Clutch)) +
> > > (1 | Beach_ID) + (1 | Week)
> > > Data: main
> > >
> > > AIC BIC logLik deviance df.resid
> > > 5439.4 5466.0 -2713.7 5427.4 614
> > >
> > > Scaled residuals:
> > > Min 1Q Median 3Q Max
> > > -1.4383 -0.7242 -0.2287 0.4866 4.0531
> > >
> > > Random effects:
> > > Groups Name Variance Std.Dev.
> > > Week (Intercept) 0.003092 0.0556
> > > Beach_ID (Intercept) 0.025894 0.1609
> > > Number of obs: 620, groups: Week, 31; Beach_ID, 8
> > >
> > > Fixed effects:
> > > Estimate Std. Error z value Pr(>|z|)
> > > (Intercept) -1.38864 0.08227 -16.879 < 2e-16 ***
> > > Relocation..Y.N.Y 0.32105 0.09152 3.508 0.000452 ***
> > > SPL 0.22218 0.08793 2.527 0.011508 *
> > > ---
> > > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >
> > > Correlation of Fixed Effects:
> > > (Intr) R..Y.N
> > > Rlct..Y.N.Y -0.143
> > > SPL -0.540 -0.038
> > >
> > > Thanks,
> > >
> > > Alessandra
> > >
> > > On Tue, Feb 11, 2020 at 7:29 PM Ben Bolker <bbolker using gmail.com
> > > <mailto:bbolker using gmail.com>> wrote:
> > >
> > > >
> > > > Short answer: if emergence success gets much above 50%, then
> > the
> > > > approximation you're making (Poisson + offset for binomial, or
> > > NB +
> > > > offset for negative binomial) doesn't work well. You might
> > try a
> > > > beta-binomial (with glmmTMB) or a binomial + an
> > > observation-level random
> > > > effect.
> > > >
> > > > (On the other hand, your intercept is -0.3, which
> > > corresponds to a
> > > > baseline emergence of 0.42 - not *very* high (but some beaches
> > > and years
> > > > will be well above that ...)
> > > >
> > > > Beyond that, are there any obvious patterns of mis-fit in the
> > > > predicted values ... ?
> > > >
> > > > On 2020-02-11 8:09 p.m., Alessandra Bielli wrote:
> > > > > Dear list
> > > > >
> > > > > I am fitting a poisson model to estimate the effect of a
> > > treatment on
> > > > > emergence success of hatchlings. To estimate emergence
> > > success, I use
> > > > > number of emerged and an offset(log(total clutch).
> > > > >
> > > > > However, overdispersion was detected:
> > > > >
> > > > >> overdisp_fun(m.emerged) #overdispersion detected
> > > > >
> > > > > chisq ratio rdf p
> > > > > 3490.300836 5.684529 614.000000 0.000000
> > > > >
> > > > > Therefore, I switched to a negative binomial. I know
> > > overdispersion is
> > > > not
> > > > > relevant for nb models, but the model plots don't look too
> > > good. I also
> > > > > tried to fit a poisson model with OLRE, but still the plots
> > > don't look
> > > > > good.
> > > > > How do I know if my model is good enough, and what can I do
> > > to improve
> > > > it?
> > > > >
> > > > >> summary(m.emerged)
> > > > > Generalized linear mixed model fit by maximum likelihood
> > > (Laplace
> > > > > Approximation) ['glmerMod']
> > > > > Family: Negative Binomial(7.604) ( log )
> > > > > Formula: Hatched ~ Relocation..Y.N. + SP +
> > > offset(log(Total_Clutch)) + (1
> > > > > |Beach_ID) + (1 | Year)
> > > > > Data: main
> > > > >
> > > > > AIC BIC logLik deviance df.resid
> > > > > 6015.6 6042.2 -3001.8 6003.6 614
> > > > >
> > > > > Scaled residuals:
> > > > > Min 1Q Median 3Q Max
> > > > > -2.6427 -0.3790 0.1790 0.5242 1.6583
> > > > >
> > > > > Random effects:
> > > > > Groups Name Variance Std.Dev.
> > > > > Beach_ID (Intercept) 0.004438 0.06662
> > > > > Year (Intercept) 0.001640 0.04050
> > > > > Number of obs: 620, groups: Beach_ID, 8; Year, 5
> > > > >
> > > > > Fixed effects:
> > > > > Estimate Std. Error z value Pr(>|z|)
> > > > > (Intercept) -0.29915 0.04055 -7.377 1.62e-13 ***
> > > > > Relocation..Y.N.Y -0.16402 0.05052 -3.247 0.00117 **
> > > > > SPL -0.08311 0.04365 -1.904 0.05689 .
> > > > > ---
> > > > > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
> > 1
> > > > >
> > > > > Correlation of Fixed Effects:
> > > > > (Intr) R..Y.N
> > > > > Rlct..Y.N.Y -0.114
> > > > > SPL -0.497 -0.054
> > > > >
> > > > >
> > > > > Thanks for your help,
> > > > >
> > > > > Alessandra
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > R-sig-mixed-models using r-project.org
> > > <mailto:R-sig-mixed-models using r-project.org> mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > > > >
> > > >
> > > > _______________________________________________
> > > > R-sig-mixed-models using r-project.org
> > > <mailto:R-sig-mixed-models using r-project.org> mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > > >
> > > _______________________________________________
> > > R-sig-mixed-models using r-project.org
> > > <mailto:R-sig-mixed-models using r-project.org> mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > >
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> .
>

	[[alternative HTML version deleted]]