[R-sig-ME] Mixed model and negative binomial distribution

Tue Oct 23 17:19:48 CEST 2012

Dave, 

I skimmed your tutorial after you first mentioned it but now am going
through line by line. It explains everything beautifully and simply, I
strongly recommend 

"Atkins, D. C., Baldwin, S., Zheng, C., Gallop, R. J., & Neighbors, C. (in
press). A tutorial on count regression and zero-altered count models for
longitudinal addictions data." 

...for anyone seeking a compact introduction to poisson & negative
binomial-based mixed models, regardless of the field they are in.

Thank you for running my snag data and explaining how to interpret
coefficients of mixed-models based on poisson and negative binomial
distributions.

--Seth W. Bigelow

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of David Atkins
Sent: Tuesday, October 23, 2012 1:35 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Mixed model and negative binomial distribution

Seth--

It is not clear to me what the disconnect is between the GLMM results 
and what you expect?  For example:

"It's tricky to interpret the log-transformed coefficients and associated
confidence intervals, but they don't seem to reflect the decrease in snags
after treatment that I see when I examine the cell means (e.g., mean East
density at time1 (pre-treatment) is 1.7, dropping to 0.7 at time2 (after
treatment).)"

I fit your model, and here are the results including random-effects:

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)        -0.764      0.359   -2.13   0.0334 *
as.factor(time)2   -0.588      0.186   -3.16   0.0016 **
as.factor(time)3   -0.604      0.189   -3.20   0.0014 **
EcozoneW            1.249      0.487    2.57   0.0103 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=192, SettingID=64
Random effect variance(s):
Group=SettingID
             Variance StdDev
(Intercept)     2.37   1.54
Negative binomial dispersion parameter: 1.812 (std. err.: 0.33774)

So, we can exponentiate these coefficients to get *conditional* rate 
ratios, or we can include the random-effects to get marginal predictions 
of snag from the model.

Conditional RR:

 > cbind(exp(fixef(my.admb)))
                       [,1]
(Intercept)      0.4660185
as.factor(time)2 0.5551761
as.factor(time)3 0.5467793
EcozoneW         3.4879006

Thus, conditional on the random-effects, there is a 45% decrease in 
snags at time 2.  That seems reasonable given the data, I think?

Marginal predictions:

 > exp(-0.764 + 2.37/2) # marginal intercept
[1] 1.523484

 > exp(-0.764 - 0.588 + 2.37/2) # marginal prediction at time2
[1] 0.8461996

Again, seem sort of reasonable, with the caveat that I am a psychologist 
who studies drug addiction and marital therapy... ;)

If some of your questions are general questions about how to interpret 
the output from Poisson / NB mixed models, then you might take a look at 
a tutorial article that we wrote (with R code), which also points to 
other resources:

http://depts.washington.edu/cshrb/newweb/statstutorials.html

Atkins, D. C., Baldwin, S., Zheng, C., Gallop, R. J., & Neighbors, C. 
(in press). A tutorial on count regression and zero-altered count models 
for longitudinal addictions data.

Granted, the example data come from alcohol focused studies (but 
interestingly enough, it seems the addictions researchers and ecologists 
have similar types of outcomes / models).

If I have missed your question, please let me know.

cheers, Dave

-- 
Dave Atkins, PhD
University of Washington
datkins at u.washington.edu
http://depts.washington.edu/cshrb/

August 1 - October 30:

Universitat Zurich
Psychologisches Institut
Klinische Psychologie
Binzmuhlestrasse 14/23
CH-8050 Zurich

+41 44 635 71 75

Original post:

Dear list: this is a continuation of a previous question. To recap,after
input from several listserv members, I have been using glmmADMB to model
snag (dead tree) densities before, after, and 5 years after forest (factor
variable 'time'). (I apologize for disregarding the advice to use 'BUGS'
approaches -- I didn't have $ for new books or time for assimilation of new
paradigm). The data sampling was stratified by 'Ecozone' ('East' densities
are much lower than 'West'). Because there are many plots with no snags, I
evaluated use of Poisson vs. Negative binomial distribution by finding mean
& variance for each combination of time & Ecozone. AIC for the NB
distribution was vastly smaller (more favorable) than for the Poisson.
Variance tended to be much higher prior to treatment (because a few plots
have large numbers of snags and loggers are instructed to knock most of
these down and simply retain a few).

My statement is glmmadmb(snag~time+Ecozone + (1|SettingID), data=.,
family="nbinom1"). ('SettingID' is a plot identifier).

So, I have these observations/questions. First, when I experiment with
'family="poisson", AIC for the glmm is only 2 AIC units greater than for the
negative binomial model, yet with AIC estimated for individual cells, AIC
Poisson >> AIC NegBinom. Any idea why this might be? (I've tried
zero-inflation for poisson & NB, doesn't seem to make much difference)

It's tricky to interpret the log-transformed coefficients and associated
confidence intervals, but they don't seem to reflect the decrease in snags
after treatment that I see when I examine the cell means (e.g., mean East
density at time1 (pre-treatment) is 1.7, dropping to 0.7 at time2 (after
treatment).) Here's part of the summary statement:

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)     -0.764      0.359   -2.13   0.0334 *
time2             -0.588    0.186   -3.16   0.0016 **
time3            -0.604     0.189   -3.20   0.0014 **
EcozoneW     1.249          0.487    2.57   0.0103 *

 >> exp(coef(.))
(Intercept)       time2       time3    EcozoneW
       0.466       0.555       0.547       3.488

I have tried the full-interaction model ("time*Ecozone"), although it is not
justified from the standpoint of AIC values. This improves the situation
slightly, although confidence intervals do not indicate a significant
decrease in snags after treatment, contrary to what common sense suggests.
Do I need to somehow be modeling the big decrease in variance between time1
and time2 in order to make sense of these data? Data below.
--Seth W. Bigelow

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models