[R-sig-ME] MCMCglmm ZIP longitudinal

Mon Jul 29 21:34:48 CEST 2013

Hi Thomas--

A few free associations (from a psychologist who mostly studies drug and 
alcohol addiction...):

A zero-inflated model (as opposed to zero-altered / hurdle / two-part 
model) is a mixture model (for the zeroes).  Depending on the 
distribution of your outcome, it may be that the over-dispersed Poisson 
can account for the majority of your zeroes.  If that is the case, then 
you have relatively little information for the ZI component... which 
*might* contribute to poor mixing.  Was the mixing better using a 
zapoisson model?  I have more and more used this type of model, 
primarily for easier interpretation.

Similar to David's comment... seems like there are a lot of fixed-effect 
terms and interactions.  I would take a really thorough descriptive look 
at the data to see if it warranted all those terms.  We show some code 
for doing descriptive plots (as well as model building and MCMCglmm 
fitting) in the following article:

Atkins, D. C., Baldwin, S., Zheng, C., Gallop, R. J., & Neighbors, C. 
(2013). A tutorial on count regression and zero-altered count models for 
longitudinal substance use data.  Psychology of Addictive Behaviors, 27, 
166-177. doi: 10.1037/a0029508 PMCID: PMC3513584.

R code / datasets can be found:

http://depts.washington.edu/cshrb/statistics-resources-tutorials-tutorial-on-count-regression/

Hope that helps.

cheers, Dave

-- 
Dave Atkins, PhD

Research Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
http://depts.washington.edu/cshrb/david-atkins/#more-48

"The combination of some data and an aching desire for an answer does 
not ensure that a reasonable answer can be extracted from a given body 
of data." -- John Tukey

Hello all,

I have been struggling with some analysis for quite a long time now, and 
hoping someone can help give me a little advice as to whether my current 
model (using MCMCglmm) seems reasonable, in addition to answering a 
question about autocorrelation in the zero-inflation term.

The responses are counts of the time that an individual cricket spent 
calling on the day of measurement (each individual is measured at weekly 
intervals from 1 week after reaching adulthood until death), and appear 
to be highly overdispersed compared to standard Poisson, and also 
zero-inflated – I have therefore been using the ZIP model in MCMCglmm. 
The ZI component would make sense (I think), because it might just be 
that – for example – I recorded a male on the one night that week he 
wasn’t singing; the Poisson process in MCMCglmm also deals with 
overdispersion well. Each individual was raised on one of two diet 
treatments during the nymph stage (ie, before measurements started), and 
then assigned to one of two diet treatments again at adulthood. I am 
interested in the effect of diet and age (and their interactions) on the 
pattern of calling. The data are unbalanced both in terms of how many 
individuals are in each treatment group (dependent upon how many 
survived the larval stage), and the number of measurements per 
individual (which was dependent upon lifespan).

Fixed effects of interest include: juvenile diet treatment, adult diet 
treatment, measurement age (for which I am using a second-order 
polynomial, as trajectories appear non-linear), and lifespan. I am 
including interactions between all of these, except between measurement 
age and lifespan (as I think that these must be conflated and result in 
overfitting…? Certainly taking these interactions out of the model make 
a big change in the results... ).

I am using individual ID in the random effects to group measurements, 
and allowing the intercept and the slope associated with measurement age 
to vary for each individual (in the Poisson process only). I don’t think 
that temporal autocorrelation is likely to be a big issue, as 7 days is 
a fairly long time to a cricket (!), and calling is quite variable 
between consecutive nights in any case.

Chapters 4 & 5 of Jarrod Hadfield’s excellent course notes have helped a 
lot, but I would definitely appreciate any opinion on whether my model 
makes sense or not (and some pointers to how to fix it if the latter)!

My main question is really about autocorrelation in the ZI term. I 
previously ran the model as ZAP and there was a significant negative 
effect in the overall trait ZA term, indicating that there is 
zero-inflation in the data… however, in this model I get quite high 
autocorrelation values for the ZI term, and it has a much lower 
effective sample size than the other fixed effects. I don’t really know 
how to explain this – I know Jarrod has said before that ZIP models 
don’t mix well, but I’m running a lot of chains and with long burn-in 
and thinning intervals (see below)… is this cause for concern, or can I 
ignore it? The ZI term is still 'significant' in this, but I'm not 
really sure what that means.

(eg a run with 1000000 chain iterations, burn-in of 50000 and thinning 
rate of 400 gives autocorrelation of <0.1 for all fixed effects with a 
lag of 400, but 0.57 for ‘traitzi_Week’… even a lag of 20000 still has 
autocorrelation just above 0.1! It also gives traitzi_Week an effective 
sample size of 138, while all fixed effects are ~2000).

I am also wondering whether having a second-order polynomial in the 
fixed effects means I should have the same in the random effects, or 
whether that even makes any sense?! I’m happy to profess my ignorance 
here, so any advice would be gratefully received. I have run the model 
with both of these specifications, with similar results for the fixed 
effects, but I'm not entirely sure how to interpret the results from the 
variance structure when using the second-order polynomial.

The current model specification is below, along with some of the output 
to show variance effects and effective sample sizes; I don’t want to 
make this email any longer than it already is, but am happy to post the 
full output if that would help, or email the data to anyone who would 
like to take a look…

Thanks,

Tom

prior.RE.ind <- list(R=list(V=diag(2),
                             nu=0.002,
                             fix=2),
                      G=list(G1=list(V=diag(2),
                                     nu=2,
                                     alpha.mu=c(0,0),
                                     alpha.V=diag(2)*1000)))

mcmc.RE.ind.time.2 <- MCMCglmm(Week ~ trait-1 +
                                ## Time

at.level(trait,1):poly(msmt.age,2,raw=TRUE) +
                                ## Lifespan
                                at.level(trait,1):Lifespan +
                                ## Treatments
                                at.level(trait,1):LarvalDiet +
                                at.level(trait,1):AdultDiet +
                                at.level(trait,1):LarvalDiet:AdultDiet +
                                ## Interactions between treatments and time

at.level(trait,1):LarvalDiet:poly(msmt.age,2,raw=TRUE) +

at.level(trait,1):AdultDiet:poly(msmt.age,2,raw=TRUE) +

at.level(trait,1):LarvalDiet:AdultDiet:poly(msmt.age,2,raw=TRUE) +
                                ## Interactions between treatments and 
lifespan
                                at.level(trait,1):LarvalDiet: Lifespan +
                                at.level(trait,1):AdultDiet: Lifespan +
                                at.level(trait,1):LarvalDiet:AdultDiet: 
Lifespan,
                              random=~us(1+at.level(trait,1):msmt.age):ID,
                              rcov=~idh(trait):units,
                              data=call.USA.long,
                              verbose=FALSE, pr=TRUE,
                              prior=prior.RE.ind,
                              family="zipoisson",
                              saveX=TRUE, saveZ=TRUE,
                              nitt=iters.1,
                              thin=sample.1,
                              burnin=discard.1)

Output:

  Iterations = 50001:999601
  Thinning interval  = 400
  Sample size  = 2375

  DIC: 4680.174

  G-structure:  ~us(1 + at.level(trait, 1):msmt.age):ID

                                                             post.mean 
  l-95% CI
(Intercept):(Intercept).ID                                  0.2745801 
3.641e-05
at.level(trait, 1):msmt.age:(Intercept).ID                 -0.0089257 
-2.343e-02
(Intercept):at.level(trait, 1):msmt.age.ID                 -0.0089257 
-2.343e-02
at.level(trait, 1):msmt.age:at.level(trait, 1):msmt.age.ID  0.0005993 
5.005e-09

                                                            u-95% CI 
eff.samp
(Intercept):(Intercept).ID                                 0.605511    28.70
at.level(trait, 1):msmt.age:(Intercept).ID                 0.001706    49.36
(Intercept):at.level(trait, 1):msmt.age.ID                 0.001706    49.36
at.level(trait, 1):msmt.age:at.level(trait, 1):msmt.age.ID 0.001365    81.63

  R-structure:  ~idh(trait):units

               post.mean l-95% CI u-95% CI eff.samp
Week.units        1.914    1.617    2.244     2214
zi_Week.units     1.000    1.000    1.000        0

    eff.samp
traitWeek 
    2137.8
traitzi_Week 
     138.2
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1 
    2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2 
    1895.6
at.level(trait, 1):Lifespan 
2045.8
at.level(trait, 1):LarvalDietL 
    2450.7
at.level(trait, 1):AdultDietL 
    2081.4
at.level(trait, 1):LarvalDietL:AdultDietL 
    2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:LarvalDietL 
    1959.7
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:LarvalDietL 
    1487.3
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:AdultDietL 
    2375.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:AdultDietL 
    1637.2
at.level(trait, 1):Lifespan:LarvalDietL 
2086.4
at.level(trait, 1):Lifespan:AdultDietL 
1557.8
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)1:LarvalDietL:AdultDietL 
   2099.0
at.level(trait, 1):poly(msmt.age, 2, raw = TRUE)2:LarvalDietL:AdultDietL 
   1462.8
at.level(trait, 1):Lifespan:LarvalDietL:AdultDietL 
1645.7

-- 
The University of Stirling is ranked in the top 50 in the world in The 
Times Higher Education 100 Under 50 table, which ranks the world's best 
100 universities under 50 years old.
The University of Stirling is a charity registered in Scotland,
  number SC 011159.