[R-sig-ME] ZIP MCMCglmm

Fri Oct 5 12:31:12 CEST 2012

Dear all,
I have several questions on a zero-inflated MCMCglmm model.

Data consist of 764 transects with porpoise counts ("sum_ppho"). On 304 transects zero porpoises were seen. Data come from several studies from the North Sea, they were all sampled in roughly the same area and from altogether 12 years. We are interested in a trend ("year"), seasonal effects ("dayofyear" quadratic (centered) Term + the simple term. Moreover we have a simple geographical variable "we"; a factor with two levels "West" and "east". The transect ID (EFF_Track) I integrated as random factor, assuming that transects are not independant of each other I used the us structure. Most transects were revisited over several years, but not all of them. Since Transects had differing length and visibility conditions, which affected the effective area sampled, I used the effective area ("sqkm") as random slope term. 

in MCMCglmm I tried the following: 

prior<-list(R=list(V=diag(2),nu=0.002), G=list(G1=list(V=diag(2),nu=0.002)))

Mz<-MCMCglmm(sum_ppho~at.level(trait,1):year*at.level(trait,1):we+
+           at.level(trait,1):(I(cdayofyear^2)+ at.level(trait,1):dayofyear), random= ~us(1+sqkm):EFF_Track, 
+              family="zipoisson",
+              data=encounter, rcov=~us(trait):units, prior=prior, verbose=FALSE,
+              nitt=100000,thin =1000, burnin = 5000)
> summary(Mz)

Iterations = 5001:99001
 Thinning interval  = 1000
 Sample size  = 95 

 DIC: 2843.102 

 G-structure:  ~us(1 + sqkm):EFF_Track

                                  post.mean l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).EFF_Track   3.62368  1.21791  6.03721    6.215
sqkm:(Intercept).EFF_Track         -0.38983 -0.61860 -0.15801   14.857
(Intercept):sqkm.EFF_Track         -0.38983 -0.61860 -0.15801   14.857
sqkm:sqkm.EFF_Track                 0.05025  0.02699  0.07369   42.211

 R-structure:  ~us(trait):units

                              post.mean  l-95% CI u-95% CI eff.samp
sum_ppho:sum_ppho.units          0.8150  0.673652   1.0178   69.615
zi_sum_ppho:sum_ppho.units       0.1894 -0.472509   0.6914    7.978
sum_ppho:zi_sum_ppho.units       0.1894 -0.472509   0.6914    7.978
zi_sum_ppho:zi_sum_ppho.units    0.1561  0.001532   0.6305    8.428

 Location effects: sum_ppho ~ at.level(trait, 1):year * at.level(trait, 1):we + at.level(trait, 1):(I(cdayofyear^2) + at.level(trait, 1):dayofyear) 

                                    post.mean   l-95% CI   u-95% CI eff.samp  pMCMC  
(Intercept)                        -3.315e+00 -4.486e+00 -2.045e+00    1.993  <0.01 *
at.level(trait, 1):year             1.673e-03  1.111e-03  2.169e-03    3.976  <0.01 *
at.level(trait, 1):wew             -5.459e+02 -6.696e+02 -4.612e+02   40.273  <0.01 *
at.level(trait, 1):I(cdayofyear^2) -3.117e-05 -5.063e-05 -7.107e-06   95.000  <0.01 *
at.level(trait, 1):dayofyear       -1.726e-03 -3.851e-03  2.335e-05   59.665 0.0632 .
at.level(trait, 1):year:wew         2.727e-01  2.305e-01  3.343e-01   40.388  <0.01 *
---

In general I feel insecure if my assumptions are right and whether prior specifications make sense. But I have two main questions:
1. Question: Am I correct in the assumption, that, if the output for autocorr(M1$Sol) and autocorr(M1$VCV) are low, that spatial autocorrelation is no issue? If I specify a relatively simple model (only main effects), autocorrelation is no issue, however the more interaction terms I include the stronger the autocorrelation becomes. Increasing itterations by a factor of ten to: nitt=100000,thin =5000, burnin = 10000, hardly changed anything about that. Is there anything else I could try? I know that I can increase nitt... further and ask Malcolm about the sir() function. But I'm especially interested, if experimenting with the random term has a chance of improving things? 
2. Question: Following methods of distance sampling, a G(0) factor was estimated compensating for the diving porpoises, which escaped the surveyors. Originally we corrected the observations by dividing the counts by the G(0) factor. So, basically only the transects with counts were corrected. Am I right in the assumption that the latent variable in a ZIP model compensates for all values with false zeros i.e. also the false zero for the third animal in a transect with 2 counted animals? If that's the case, I would just not include any correction for g(0). But if the latent variable only compensates for false zeros of all zero values, but not for other values from the poisson distribution, then I would assume that the correction factor is needed. Is that the difference between a hurdle and a ZIP model? ZIP was in my case better than hurdle according to DIC.

I'm sorry, I hope you can understand my questions. I'm neither a statistician, nor a native speaker, so I would like to appologize for bad choice of words. And please let me know, if a representive sample of my data would help. 
Thank you for reading this, ideas and suggestions are very much appreciated. 

Katrin