[R-sig-ME] MCMCglmm ZIP longitudinal
David Atkins
datkins at u.washington.edu
Mon Jul 29 21:34:48 CEST 2013
Hi Thomas--
A few free associations (from a psychologist who mostly studies drug and
alcohol addiction...):
A zero-inflated model (as opposed to zero-altered / hurdle / two-part
model) is a mixture model (for the zeroes). Depending on the
distribution of your outcome, it may be that the over-dispersed Poisson
can account for the majority of your zeroes. If that is the case, then
you have relatively little information for the ZI component... which
*might* contribute to poor mixing. Was the mixing better using a
zapoisson model? I have more and more used this type of model,
primarily for easier interpretation.
Similar to David's comment... seems like there are a lot of fixed-effect
terms and interactions. I would take a really thorough descriptive look
at the data to see if it warranted all those terms. We show some code
for doing descriptive plots (as well as model building and MCMCglmm
fitting) in the following article:
Atkins, D. C., Baldwin, S., Zheng, C., Gallop, R. J., & Neighbors, C.
(2013). A tutorial on count regression and zero-altered count models for
longitudinal substance use data. Psychology of Addictive Behaviors, 27,
166-177. doi: 10.1037/a0029508 PMCID: PMC3513584.
R code / datasets can be found:
http://depts.washington.edu/cshrb/statistics-resources-tutorials-tutorial-on-count-regression/
Hope that helps.
cheers, Dave
--
Dave Atkins, PhD
Research Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
http://depts.washington.edu/cshrb/david-atkins/#more-48
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." -- John Tukey
Hello all,
I have been struggling with some analysis for quite a long time now, and
hoping someone can help give me a little advice as to whether my current
model (using MCMCglmm) seems reasonable, in addition to answering a
question about autocorrelation in the zero-inflation term.
The responses are counts of the time that an individual cricket spent
calling on the day of measurement (each individual is measured at weekly
intervals from 1 week after reaching adulthood until death), and appear
to be highly overdispersed compared to standard Poisson, and also
zero-inflated – I have therefore been using the ZIP model in MCMCglmm.
The ZI component would make sense (I think), because it might just be
that – for example – I recorded a male on the one night that week he
wasn’t singing; the Poisson process in MCMCglmm also deals with
overdispersion well. Each individual was raised on one of two diet
treatments during the nymph stage (ie, before measurements started), and
then assigned to one of two diet treatments again at adulthood. I am
interested in the effect of diet and age (and their interactions) on the
pattern of calling. The data are unbalanced both in terms of how many
individuals are in each treatment group (dependent upon how many
survived the larval stage), and the number of measurements per
individual (which was dependent upon lifespan).
Fixed effects of interest include: juvenile diet treatment, adult diet
treatment, measurement age (for which I am using a second-order
polynomial, as trajectories appear non-linear), and lifespan. I am
including interactions between all of these, except between measurement
age and lifespan (as I think that these must be conflated and result in
overfitting…? Certainly taking these interactions out of the model make
a big change in the results... ).
I am using individual ID in the random effects to group measurements,
and allowing the intercept and the slope associated with measurement age
to vary for each individual (in the Poisson process only). I don’t think
that temporal autocorrelation is likely to be a big issue, as 7 days is
a fairly long time to a cricket (!), and calling is quite variable
between consecutive nights in any case.
Chapters 4 & 5 of Jarrod Hadfield’s excellent course notes have helped a
lot, but I would definitely appreciate any opinion on whether my model
makes sense or not (and some pointers to how to fix it if the latter)!
My main question is really about autocorrelation in the ZI term. I
previously ran the model as ZAP and there was a significant negative
effect in the overall trait ZA term, indicating that there is
zero-inflation in the data… however, in this model I get quite high
autocorrelation values for the ZI term, and it has a much lower
effective sample size than the other fixed effects. I don’t really know
how to explain this – I know Jarrod has said before that ZIP models
don’t mix well, but I’m running a lot of chains and with long burn-in
and thinning intervals (see below)… is this cause for concern, or can I
ignore it? The ZI term is still 'significant' in this, but I'm not
really sure what that means.
(eg a run with 1000000 chain iterations, burn-in of 50000 and thinning
rate of 400 gives autocorrelation of <0.1 for all fixed effects with a
lag of 400, but 0.57 for ‘traitzi_Week’… even a lag of 20000 still has
autocorrelation just above 0.1! It also gives traitzi_Week an effective
sample size of 138, while all fixed effects are ~2000).
I am also wondering whether having a second-order polynomial in the
fixed effects means I should have the same in the random effects, or
whether that even makes any sense?! I’m happy to profess my ignorance
here, so any advice would be gratefully received. I have run the model
with both of these specifications, with similar results for the fixed
effects, but I'm not entirely sure how to interpret the results from the
variance structure when using the second-order polynomial.
The current model specification is below, along with some of the output
to show variance effects and effective sample sizes; I don’t want to
make this email any longer than it already is, but am happy to post the
full output if that would help, or email the data to anyone who would
like to take a look…
Thanks,
Tom
prior.RE.ind <- list(R=list(V=diag(2),
nu=0.002,
fix=2),
G=list(G1=list(V=diag(2),
nu=2,
alpha.mu=c(0,0),
alpha.V=diag(2)*1000)))
mcmc.RE.ind.time.2 <- MCMCglmm(Week ~ trait-1 +
## Time
at.level(trait,1):poly(msmt.age,2,raw=TRUE) +
## Lifespan
at.level(trait,1):Lifespan +
## Treatments
at.level(trait,1):LarvalDiet +
at.level(trait,1):AdultDiet +
at.level(trait,1):LarvalDiet:AdultDiet +
## Interactions between treatments and time
at.level(trait,1):LarvalDiet:poly(msmt.age,2,raw=TRUE) +
at.level(trait,1):AdultDiet:poly(msmt.age,2,raw=TRUE) +
at.level(trait,1):LarvalDiet:AdultDiet:poly(msmt.age,2,raw=TRUE) +
## Interactions between treatments and
lifespan
at.level(trait,1):LarvalDiet: Lifespan +
at.level(trait,1):AdultDiet: Lifespan +
at.level(trait,1):LarvalDiet:AdultDiet:
Lifespan,
random=~us(1+at.level(trait,1):msmt.age):ID,
rcov=~idh(trait):units,
data=call.USA.long,
verbose=FALSE, pr=TRUE,
prior=prior.RE.ind,
family="zipoisson",
saveX=TRUE, saveZ=TRUE,
nitt=iters.1,
thin=sample.1,
burnin=discard.1)
Output:
Iterations = 50001:999601
Thinning interval = 400
Sample size = 2375
DIC: 4680.174
G-structure: ~us(1 + at.level(trait, 1):msmt.age):ID
post.mean
l-95% CI
(Intercept):(Intercept).ID 0.2745801
3.641e-05
at.level(trait, 1):msmt.age:(Intercept).ID -0.0089257
-2.343e-02
(Intercept):at.level(trait, 1):msmt.age.ID -0.0089257
-2.343e-02
at.level(trait, 1):msmt.age:at.level(trait, 1):msmt.age.ID 0.0005993
5.005e-09
u-95% CI
eff.samp
(Intercept):(Intercept).ID 0.605511 28.70
at.level(trait, 1):msmt.age:(Intercept).ID 0.001706 49.36
(Intercept):at.level(trait, 1):msmt.age.ID 0.001706 49.36
at.level(trait, 1):msmt.age:at.level(trait, 1):msmt.age.ID 0.001365 81.63
R-structure: ~idh(trait):units
post.mean l-95% CI u-95% CI eff.samp
Week.units 1.914 1.617 2.244 2214
zi_Week.units 1.000 1.000 1.000 0
eff.samp
traitWeek
2137.8
traitzi_Week
138.2
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1
2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2
1895.6
at.level(trait, 1):Lifespan
2045.8
at.level(trait, 1):LarvalDietL
2450.7
at.level(trait, 1):AdultDietL
2081.4
at.level(trait, 1):LarvalDietL:AdultDietL
2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:LarvalDietL
1959.7
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:LarvalDietL
1487.3
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:AdultDietL
2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:AdultDietL
1637.2
at.level(trait, 1):Lifespan:LarvalDietL
2086.4
at.level(trait, 1):Lifespan:AdultDietL
1557.8
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:LarvalDietL:AdultDietL
2099.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:LarvalDietL:AdultDietL
1462.8
at.level(trait, 1):Lifespan:LarvalDietL:AdultDietL
1645.7
--
The University of Stirling is ranked in the top 50 in the world in The
Times Higher Education 100 Under 50 table, which ranks the world's best
100 universities under 50 years old.
The University of Stirling is a charity registered in Scotland,
number SC 011159.
More information about the R-sig-mixed-models
mailing list