[R-sig-ME] Fit negative binomial glmm credible intervals from the predict function?

Tue Jan 7 03:08:06 CET 2014

Sheryn Olson <sheryn.olson at ...> writes:

> 
> Hello Ben and all,
> 
> Thank you Ben, very much for your previous reply.  Yes, stand/plot makes
> sense.  After talking with a few statisticians, I got convergence!   The
> repeated measure issue is solvable with a dummy variable for each of the 3
> years (like a fiscal or school year) - so yr1 = 2010smr+2011wtr, and so
> on.  Pellet plots were sampled 6 times, 3 each season, vegetation only
> once.  Among year variability can be high.
> 
> I'm now attempting to make predictive plots from models.  I used UCLA's
> method posted at their IRDA site that has a negative binomial example:
> 
>             http://www.ats.ucla.edu/stat/r/dae/nbreg.htm
>            newdata2 <- cbind(newdata2, predict(m1, newdata2,type =
> "link", se.fit=TRUE))
>            newdata2 <- within(newdata2, {
>            DaysAbsent <- exp(fit)
>                  LL <- exp(fit - 1.96 * se.fit)
>                  UL <- exp(fit + 1.96 * se.fit)
>              })
> but I have an uneasy feeling about fitting standard errors from a
> gaussian distribution.

  In case it helps, what you're assuming is Normal is the *sampling
distribution of the parameters*, not the data themselves ...

  The bottom line is that what you're doing seems reasonable, with
the caveats already expressed in ?predict.glmmadmb ...

> In the past I've used MCMCglmm to estimate credible intervals around the
> beta coefficients of a categorical variable, but now,
> I have a continuous variable, conifer sapling count.

  Well, technically it's a discrete (count) variable, not continuous ...

  Section 2 of the glmmADMB vignette [vignette("glmmADMB")] has
a bit of information about running post-fit MCMC.   There should
be an example of how to generate confidence intervals (basically,
you would generate predictions for each sampled set of coefficients
and then compute the quantiles or credible intervals of the
predictions), but there isn't (yet ...)

> So, lots of questions!
> 
> How does one make an accurate predictive graph from a nbinom model?

  Good question.  Not easy, you may need to take some shortcuts
and hope for the best.

> Does it make sense to set up a dataframe with the means of the
> covariates when the covariates'
> distributions were skewed count data?

  It depends what values you want to predict for.  The data frame
typically includes "typical" values of the parameters; you're welcome
to compute predictions for the medians instead if you prefer, or
for a range of values.

> How do I use MCMCglmm correctly?

  ??  Do you mean the 'mcmc' argument of glmmadmb?

> Is "se.fit" valid for the nbinom log link, or....?

  [Update: I just wrote a bunch of stuff about how se.fit doesn't
work for glmmadmb, when it indeed does! However, I had already
written all the stuff below, which describes more generally how
you would do it by hand, so I'm just going to send it anyway]

  The basic recipe for constructing confidence intervals, as laid out
in http://glmm.wikidot.com/faq in the "Predictions and/or confidence
(or prediction) intervals on predictions" section, is

 (1) construct the model matrix (X)
  * If you want predictions on the original model (not this case),
the model.matrix() accessor may work to extract it from the fit for you
  * If not, or if you want to construct predictions for new data,
you need to call model.matrix() with the fixed-effect formula (you
may or may not be able to extract the fixed-effect part of the
formula only from the model, but it's usually fairly easy just
to respecify it

  (2) extract the variance-covariance matrix of the fixed effect
predictors, V (usually just vcov(model))

  (3) extract the fixed-effect coefficients (usually just fixef(model)
or coef(model))

 then the predictions (on the linear predictor scale, at the population
level [i.e. ignoring all random effects]) are

 X %*% beta

(plus an offset if there is one)

and the standard errors of prediction, *CONDITIONAL ON THE
RANDOM EFFECTS* [i.e. ignoring all uncertainty due to uncertainty
of the random effects] are

  sqrt(diag(X %*% V %*% t(X)))

 Then to get confidence intervals you should compute

  inverse_link(fit +/- 1.96*se.fit) exactly as you have below

This assumes further that the sampling distribution of the
coefficients is Normal (in which case the linear computation
we did above will also lead to a Normal distribution, on 
the linear predictor scale).  This might not be true for
small or badly behaved data sets (of course it is never
exactly true except asymptotically ...), but there's not
much you can do about it without working much harder.

> 
> my Code:
> ########## CONIFER SAPLINGS by pellets/ha/month (phm)  #####################
> #### mod1.all <- glmmadmb(pellets ~ season * (t.con.splgs +
> t.dec.trees + pctMidCov + t.BAtrees + cc) +
> 
> ####                     offset(ln.days)+(1|stand/plot)+ (1|hareyr),
> data=hv, family="nbinom")
> predCS <- data.frame(
>                      t.con.splgs = rep(seq(from = min(hv$t.con.splgs),
> to = max(hv$t.con.splgs),length.out = 100), 2),
> 
>                          t.dec.trees=rep(mean(hv$t.dec.trees),200),
>                          pctMidCov=rep(mean(hv$pctMidCov),200),
> 
> t.BAtrees=rep(mean(hv$t.BAtrees),200),cc=rep(mean(hv$cc),200),
> 
>                          ln.days=rep(log(30.25),200),
>                          season = factor(rep(1:2, each = 100), levels
> = 1:2, labels =levels(hv$season)))
> 
> predCS <- cbind(predCS, predict(mod1.all, predCS, type = "link", se.fit=TRUE))
> 
> predCS <- within(predCS, {
>   pellets <- exp(fit)
>   LL <- exp(fit - 1.96 * se.fit)   # these seem wrong !!
>   UL <- exp(fit + 1.96 * se.fit)   # ??
> })

  Do you mean you're suspicious of the code, or that the numbers
it produces seem wrong?

> 
> head(predCS)           #  check to see fits
> 
>   t.con.splgs t.dec.trees pctMidCov t.BAtrees       cc  ln.days season
>       fit    se.fit       UL       LL   pellets
> 1   0.0000000    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.051665 0.5187013 6437.304 842.6536 0.3493557
> 
> 2   0.7142493    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.046329 0.5187194 6471.975 847.1320 0.3512248
> 3   1.4284985    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.040993 0.5187633 6507.163 851.5911 0.3531040
> 
> 4   2.1427478    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.035657 0.5188331 6542.872 856.0304 0.3549932
> 5   2.8569971    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.030321 0.5189286 6579.111 860.4492 0.3568926
> 
> 6   3.5712464    3.493585  40.80654  4.117168 82.84937 3.409496    smr
> -1.024984 0.5190500 6615.885 864.8472 0.3588020
>   con.splgs.ha      phm
> 1      0.00000 2329.038
> 2      5.10152 2341.499
> 3     20.40608 2354.027
> 
> 4     45.91368 2366.622
> 5     81.62432 2379.284
> 6    127.53801 2392.014
> 
> ############# pellets/ha/month (phm) scale up
> #############  and scale up saplings to per ha from 0.1ha plot level
> predCS <- within(predCS, {
> 
>   pellets <- exp(fit)
>   phm <- pellets/1.5*10000
>    LL <- (exp(fit - 1.96 * se.fit))/1.5*10000
>    UL <- (exp(fit + 1.96 * se.fit))/1.5*10000
>    con.splgs.ha <- 10*(t.con.splgs^2)          # backtransform square
> root and scale up
> 
> })
> CS <- ggplot(predCS, aes(con.splgs.ha, phm)) +
> 
>   theme_bw() +   #eliminates background, gridlines, but not border
>   theme(
>     plot.background = element_blank()
>    ,panel.grid.major = element_blank()
> 
>    ,panel.grid.minor = element_blank()
> #   ,panel.border = element_blank()
>    ,panel.background = element_blank()
>   ) +
> 
>   geom_ribbon(aes(ymin = LL, ymax = UL, fill = season),alpha = .25) +
>   #geom_line(aes(colour = season),size = 1) +
> 
>   geom_smooth(aes(colour=season, linetype=season),size = 1, se = F) +
>   scale_colour_manual(values=c("wtr"= 4, "smr" = 3)) +
>   scale_x_continuous(limits = c(0, 25000)) +
>   scale_y_continuous(limits = c(0, 12000)) +
> 
>   scale_fill_brewer(palette="Accent") +
> 
>   theme(legend.position = 'none') +         #get rid of the legend
>   labs(x = "Conifer Sapling Count per ha", y = "Predicted Pellets/ha/month")
> 
> print(CS)
> 
> Thanks for any ideas.
> Sheryn
>