[R-sig-ME] Fwd: model check for negative binomial model

Mon Feb 17 21:15:52 CET 2020

    That's correct.  There are some delicate issues about prediction:

* do you want to use the original (potentially unbalanced) data for
prediction? (That's what you're doing here).
* or, do you want to make predictions for a "typical" nest and week
combination, in which case you would use

  pframe <- with(your_data,
        expand.grid(Relocation..Y.N.=unique(Relocation..Y.N.),
            SP=unique(SP))
  predict(m.unhatched,type="response",re.form=NA,newdata=pframe))

  You could also use expand.grid() to generate a balanced design (i.e.
all combinations of weeks and nests), which would give yet another answer.

  There are a lot of packages designed for doing these kinds of
post-fitting manipulations (e.g. 'margins', 'emmeans'), you might find
them useful ...

On 2020-02-17 1:48 p.m., Alessandra Bielli wrote:
> Dear Thierry
> 
> Thanks for your reply.
> 
> I read a bit about the prediction for a binomial model with
> success/failures and I have a couple of questions. 
> 
> If I use the predict function with the model you recommended, I obtain
> log.odds or probabilities if I use "type=response":
> 
> tapply(predict(m.unhatched,type="response"),list(main$SP,main$Relocation..Y.N.),mean)
>           N         Y
> G 0.7314196 0.6414554
> L 0.6983576 0.6003087
> 
> Are these probabilities of success (i.e. hatched) in one nest? 
> 
> Thanks,
> 
> Alessandra
> 
> On Mon, Feb 17, 2020 at 7:18 AM Thierry Onkelinx
> <thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>> wrote:
> 
>     Dear Alessandra,
> 
>     Since you have both the number hatched and the total clutch size you
>     can calculate the number of successes and failures. That is
>     sufficient for a binomial distribution.
> 
>     glmer(cbind(Hatched, Unhatched) ~ Relocation..Y.N. + SP + (1 |
>     Beach_ID) + (1 | Week), family = binmial)
> 
>     A negative binomial or Poisson allow predictions larger than the
>     offset. Which is nonsense given that the number hatched cannot
>     surpass the total clutch size.
> 
>     Best regards,
> 
>     ir. Thierry Onkelinx
>     Statisticus / Statistician
> 
>     Vlaamse Overheid / Government of Flanders
>     INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
>     NATURE AND FOREST
>     Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>     thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
>     Havenlaan 88 bus 73, 1000 Brussel
>     www.inbo.be <http://www.inbo.be>
> 
>     ///////////////////////////////////////////////////////////////////////////////////////////
>     To call in the statistician after the experiment is done may be no
>     more than asking him to perform a post-mortem examination: he may be
>     able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>     The plural of anecdote is not data. ~ Roger Brinner
>     The combination of some data and an aching desire for an answer does
>     not ensure that a reasonable answer can be extracted from a given
>     body of data. ~ John Tukey
>     ///////////////////////////////////////////////////////////////////////////////////////////
> 
>     <https://www.inbo.be>
> 
> 
>     Op wo 12 feb. 2020 om 18:42 schreef Alessandra Bielli
>     <bielli.alessandra using gmail.com <mailto:bielli.alessandra using gmail.com>>:
> 
>         Dear Ben
> 
>         Thanks for your quick response.
> 
>         Yes, emergence success is usually between 60 and 80% or higher.
>         I am not sure how to use a binomial, if my data are counts?
> 
>         Can you explain why the approximation doesn't work well if
>         success gets
>         much above 50%? Does it make sense, then, to have "unhatched" as
>         dependent
>         variable, so that I predict mortality (usually below 50%) using
>         a nb with
>         offset(log(total clutch)) ?
> 
>         > summary(m.emerged)
>         Generalized linear mixed model fit by maximum likelihood (Laplace
>         Approximation) ['glmerMod']
>          Family: Negative Binomial(2.2104)  ( log )
>         Formula: Unhatched ~ Relocation..Y.N. + SP +
>         offset(log(Total_Clutch)) +
>            (1 | Beach_ID) + (1 | Week)
>            Data: main
> 
>              AIC      BIC   logLik deviance df.resid
>           5439.4   5466.0  -2713.7   5427.4      614
> 
>         Scaled residuals:
>             Min      1Q  Median      3Q     Max
>         -1.4383 -0.7242 -0.2287  0.4866  4.0531
> 
>         Random effects:
>          Groups   Name        Variance Std.Dev.
>          Week     (Intercept) 0.003092 0.0556
>          Beach_ID (Intercept) 0.025894 0.1609
>         Number of obs: 620, groups:  Week, 31; Beach_ID, 8
> 
>         Fixed effects:
>                           Estimate Std. Error z value Pr(>|z|)
>         (Intercept)       -1.38864    0.08227 -16.879  < 2e-16 ***
>         Relocation..Y.N.Y  0.32105    0.09152   3.508 0.000452 ***
>         SPL                0.22218    0.08793   2.527 0.011508 *
>         ---
>         Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
>         Correlation of Fixed Effects:
>                     (Intr) R..Y.N
>         Rlct..Y.N.Y -0.143
>         SPL         -0.540 -0.038
> 
>         Thanks,
> 
>         Alessandra
> 
>         On Tue, Feb 11, 2020 at 7:29 PM Ben Bolker <bbolker using gmail.com
>         <mailto:bbolker using gmail.com>> wrote:
> 
>         >
>         >   Short answer: if emergence success gets much above 50%, then the
>         > approximation you're making (Poisson + offset for binomial, or
>         NB +
>         > offset for negative binomial) doesn't work well.  You might try a
>         > beta-binomial (with glmmTMB) or a binomial + an
>         observation-level random
>         > effect.
>         >
>         >   (On the other hand, your intercept is -0.3, which
>         corresponds to a
>         > baseline emergence of 0.42 - not *very* high (but some beaches
>         and years
>         > will be well above that ...)
>         >
>         >   Beyond that, are there any obvious patterns of mis-fit in the
>         > predicted values ... ?
>         >
>         > On 2020-02-11 8:09 p.m., Alessandra Bielli wrote:
>         > > Dear list
>         > >
>         > > I am fitting a poisson model to estimate the effect of a
>         treatment on
>         > > emergence success of hatchlings. To estimate emergence
>         success, I use
>         > > number of emerged and an offset(log(total clutch).
>         > >
>         > > However, overdispersion was detected:
>         > >
>         > >> overdisp_fun(m.emerged) #overdispersion detected
>         > >
>         > >       chisq       ratio         rdf           p
>         > > 3490.300836    5.684529  614.000000    0.000000
>         > >
>         > > Therefore, I switched to a negative binomial. I know
>         overdispersion is
>         > not
>         > > relevant for nb models, but the model plots don't look too
>         good. I also
>         > > tried to fit a poisson model with OLRE, but still the  plots
>         don't look
>         > > good.
>         > > How do I know if my model is good enough, and what can I do
>         to improve
>         > it?
>         > >
>         > >> summary(m.emerged)
>         > > Generalized linear mixed model fit by maximum likelihood
>         (Laplace
>         > > Approximation) ['glmerMod']
>         > >  Family: Negative Binomial(7.604)  ( log )
>         > > Formula: Hatched ~ Relocation..Y.N. + SP +
>         offset(log(Total_Clutch)) + (1
>         > > |Beach_ID) + (1 | Year)
>         > >    Data: main
>         > >
>         > >      AIC      BIC   logLik deviance df.resid
>         > >   6015.6   6042.2  -3001.8   6003.6      614
>         > >
>         > > Scaled residuals:
>         > >     Min      1Q  Median      3Q     Max
>         > > -2.6427 -0.3790  0.1790  0.5242  1.6583
>         > >
>         > > Random effects:
>         > >  Groups   Name        Variance Std.Dev.
>         > >  Beach_ID (Intercept) 0.004438 0.06662
>         > >  Year     (Intercept) 0.001640 0.04050
>         > > Number of obs: 620, groups:  Beach_ID, 8; Year, 5
>         > >
>         > > Fixed effects:
>         > >                   Estimate Std. Error z value Pr(>|z|)
>         > > (Intercept)       -0.29915    0.04055  -7.377 1.62e-13 ***
>         > > Relocation..Y.N.Y -0.16402    0.05052  -3.247  0.00117 **
>         > > SPL               -0.08311    0.04365  -1.904  0.05689 .
>         > > ---
>         > > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>         > >
>         > > Correlation of Fixed Effects:
>         > >             (Intr) R..Y.N
>         > > Rlct..Y.N.Y -0.114
>         > > SPL         -0.497 -0.054
>         > >
>         > >
>         > > Thanks for your help,
>         > >
>         > > Alessandra
>         > >
>         > >
>         > > _______________________________________________
>         > > R-sig-mixed-models using r-project.org
>         <mailto:R-sig-mixed-models using r-project.org> mailing list
>         > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>         > >
>         >
>         > _______________________________________________
>         > R-sig-mixed-models using r-project.org
>         <mailto:R-sig-mixed-models using r-project.org> mailing list
>         > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>         >
>         _______________________________________________
>         R-sig-mixed-models using r-project.org
>         <mailto:R-sig-mixed-models using r-project.org> mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>