# [R-sig-ME] ZIP MCMCglmm

Sun Oct 7 01:07:49 CEST 2012

```Hi,

A) The residual variance for the zero-inflation is not identified nor
is the residual covariation between the zero-inflation and the Poisson
part. However, you are attempting to estimate both. Use:

rcov=idh(trait):units

and fix the [2,2] element of the covariance matrix at something (e.g
1) in the prior (i.e. add fix=2 to your R prior).

B) you have no unique intercepts for the zero inflation and the
Poisson part. You should have trait as a main effect in the fixed
formula (or trait-1 might be easier to interpret).

C)

random= ~us(1+sqkm):EFF_Track

assumes that the regression parameters (intercept and slope) of sqkm
on log Poisson rate and logit zero-inflation are identical within
EFF_Tracks, and that the slope=0 on average (because its not in the
fixed formula). I would have one of the following in the fixed effects:

a) trait:sqkm
b) at.level(trait,1):sqkm
c) at.level(trait,2):sqkm

and one of the following in the random effects.

i) random= ~us(trait):EFF_Track
ii) random= ~idh(trait):EFF_Track
iii) random= ~us(at.level(trait,1)):EFF_Track
iv) random= ~us(at.level(trait,2)):EFF_Track

and make sure that if i) or ii) to have a),  if iii) to have a) or b)
and if iv) to have a) or c).

If you use ZAP rather than ZIP then fixed=~sqkm and random= ~EFF_Track
is OK too.

D) DIC should not be trusted.

E) Not sure, but it sounds like g(0) is correcting for missed
sightings of individuals present (?) and therefore refers to the
Poisson part of the model.  The zero-inflation part may also be due to
missed sightings of individuals present (e.g. on some transects the
observers had their eyes closed!) but it is possibly more likely to be
due to absence of porpoises? I don't know.

Cheers,

Jarrod

Quoting Katrin Ronnenberg <ronnenberg at gmx.com> on Fri, 05 Oct 2012
12:31:12 +0200:

> Dear all,
> I have several questions on a zero-inflated MCMCglmm model.
>
> Data consist of 764 transects with porpoise counts ("sum_ppho"). On
> 304 transects zero porpoises were seen. Data come from several
> studies from the North Sea, they were all sampled in roughly the
> same area and from altogether 12 years. We are interested in a trend
> ("year"), seasonal effects ("dayofyear" quadratic (centered) Term +
> the simple term. Moreover we have a simple geographical variable
> "we"; a factor with two levels "West" and "east". The transect ID
> (EFF_Track) I integrated as random factor, assuming that transects
> are not independant of each other I used the us structure. Most
> transects were revisited over several years, but not all of them.
> Since Transects had differing length and visibility conditions,
> which affected the effective area sampled, I used the effective area
> ("sqkm") as random slope term.
>
>
>
> in MCMCglmm I tried the following:
>
> prior<-list(R=list(V=diag(2),nu=0.002), G=list(G1=list(V=diag(2),nu=0.002)))
>
> Mz<-MCMCglmm(sum_ppho~at.level(trait,1):year*at.level(trait,1):we+
> +           at.level(trait,1):(I(cdayofyear^2)+
> at.level(trait,1):dayofyear), random= ~us(1+sqkm):EFF_Track,
> +              family="zipoisson",
> +              data=encounter, rcov=~us(trait):units, prior=prior,
> verbose=FALSE,
> +              nitt=100000,thin =1000, burnin = 5000)
>> summary(Mz)
>
> Iterations = 5001:99001
>  Thinning interval  = 1000
>  Sample size  = 95
>
>  DIC: 2843.102
>
>  G-structure:  ~us(1 + sqkm):EFF_Track
>
>                                   post.mean l-95% CI u-95% CI eff.samp
> (Intercept):(Intercept).EFF_Track   3.62368  1.21791  6.03721    6.215
> sqkm:(Intercept).EFF_Track         -0.38983 -0.61860 -0.15801   14.857
> (Intercept):sqkm.EFF_Track         -0.38983 -0.61860 -0.15801   14.857
> sqkm:sqkm.EFF_Track                 0.05025  0.02699  0.07369   42.211
>
>  R-structure:  ~us(trait):units
>
>                               post.mean  l-95% CI u-95% CI eff.samp
> sum_ppho:sum_ppho.units          0.8150  0.673652   1.0178   69.615
> zi_sum_ppho:sum_ppho.units       0.1894 -0.472509   0.6914    7.978
> sum_ppho:zi_sum_ppho.units       0.1894 -0.472509   0.6914    7.978
> zi_sum_ppho:zi_sum_ppho.units    0.1561  0.001532   0.6305    8.428
>
>  Location effects: sum_ppho ~ at.level(trait, 1):year *
> at.level(trait, 1):we + at.level(trait, 1):(I(cdayofyear^2) +
> at.level(trait, 1):dayofyear)
>
>                                     post.mean   l-95% CI   u-95% CI
> eff.samp  pMCMC
> (Intercept)                        -3.315e+00 -4.486e+00 -2.045e+00
>   1.993  <0.01 *
> at.level(trait, 1):year             1.673e-03  1.111e-03  2.169e-03
>   3.976  <0.01 *
> at.level(trait, 1):wew             -5.459e+02 -6.696e+02 -4.612e+02
>  40.273  <0.01 *
> at.level(trait, 1):I(cdayofyear^2) -3.117e-05 -5.063e-05 -7.107e-06
>  95.000  <0.01 *
> at.level(trait, 1):dayofyear       -1.726e-03 -3.851e-03  2.335e-05
>  59.665 0.0632 .
> at.level(trait, 1):year:wew         2.727e-01  2.305e-01  3.343e-01
>  40.388  <0.01 *
> ---
>
>
>
>
> In general I feel insecure if my assumptions are right and whether
> prior specifications make sense. But I have two main questions:
> 1. Question: Am I correct in the assumption, that, if the output for
> autocorr(M1\$Sol) and autocorr(M1\$VCV) are low, that spatial
> autocorrelation is no issue? If I specify a relatively simple model
> (only main effects), autocorrelation is no issue, however the more
> interaction terms I include the stronger the autocorrelation
> becomes. Increasing itterations by a factor of ten to:
> nitt=100000,thin =5000, burnin = 10000, hardly changed anything
> about that. Is there anything else I could try? I know that I can
> increase nitt... further and ask Malcolm about the sir() function.
> But I'm especially interested, if experimenting with the random term
> has a chance of improving things?
> 2. Question: Following methods of distance sampling, a G(0) factor
> was estimated compensating for the diving porpoises, which escaped
> the surveyors. Originally we corrected the observations by dividing
> the counts by the G(0) factor. So, basically only the transects with
> counts were corrected. Am I right in the assumption that the latent
> variable in a ZIP model compensates for all values with false zeros
> i.e. also the false zero for the third animal in a transect with 2
> counted animals? If that's the case, I would just not include any
> correction for g(0). But if the latent variable only compensates for
> false zeros of all zero values, but not for other values from the
> poisson distribution, then I would assume that the correction factor
> is needed. Is that the difference between a hurdle and a ZIP model?
> ZIP was in my case better than hurdle according to DIC.
>
> I'm sorry, I hope you can understand my questions. I'm neither a
> statistician, nor a native speaker, so I would like to appologize
> for bad choice of words. And please let me know, if a representive
> sample of my data would help.
> Thank you for reading this, ideas and suggestions are very much appreciated.
>
> Katrin
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

```