[R-sig-ME] Poor mixing and autocorrelation of ZIP data in MCMCglmm

Pierre de Villemereuil p|erre@dev|||emereu|| @end|ng |rom ephe@p@|@eu
Thu Jun 18 09:23:45 CEST 2020


Hi,

I agree with Ben, I believe it is a shame to waste that much information. With multiple measurements, *sometimes* removing non-repeated individuals might help improve things, but not always. I would try to include all individuals, and only if you still have issues, start by keeping individuals with at least 2 measurements (5 is too high a threshold in my humble opinion).

Now, for your results, it's nice to see the variance is a bit more reasonable (9 is still a high value, but it is not an impossible one if "id" explains the zeros a lot). However, your effective sample sizes for the zi-related variances is still quite low, so you need to run the MCMC for longer, maybe try to reach at least 200 or 500 (1000 would be a more comfortable target, but it might be difficult to reach if auto-correlation is high).

Cheers,
Pierre.

Le mercredi 17 juin 2020, 11:02:49 CEST Vital Heim a écrit :
> Dear Pierre,
> 
> Thank you again for the answer and the new prior!
> 
> I have been working on the model last week and your question if I have
> enough data got me thinking. I therefore reconsidered my available data and
> also looked into if I can simplify my model.
> 
> The model I submitted before had 378 observations. The observations came
> from 28 different individuals. However, I think one problem could have been
> that most of these individuals only had 1 observation because they only
> appeared at a provisioning event once. I therefore decided to remove all
> the individuals that have less than 5 total observations from the data set
> so that theoretically every animal had at least 5 opportunities to make a
> decision to attend (and for how long) a provisioning event or not. That
> left me with 13 individuals and a total of 405 observations (as I do not
> have temperature as a fixed effect I could keep the rows with temperature
> == NA, resulting in more observations than before). The nr. observations
> per individual are:
> 
> id   12 13 14 16 17 19 20 21 34 38 39 40 42
> #obs 27 54 67 52 44 34 26 35 41  7  7  5  6
> 
> Trying to simplify my model had me removing two fixed effects (temperature
> and gender) from my model formula. That left me with the model:
> 
> priorP2 <- list(R = list(V = diag(2), nu = 2, fix = 2),
>                 G = list(
>                 G1 = list(V = diag(2), nu = 1000, alpha.mu = c(0,0),
> alpha.V = diag(2))
>                ,G2 = list(V = diag(2), nu = 1000, alpha.mu = c(0,0),
> alpha.V = diag(2))))
> 
> NITT <- 320000
> BURN <- 20000
> THIN <- 200
> MULT <- 10
> 
> model3.1 <- MCMCglmm(bait ~ trait - 1
>                      + trait:individuals + trait:tourists + trait:operators
> + at.level(trait,1):presence
>                      , random = ~idh(trait):id + idh(trait):event
>                      , rcov =~ idh(trait):units
>                      , data=df, family="zipoisson", prior=priorP2
>                      , nitt=NITT*MULT, burnin=BURN*MULT, thin = THIN*MULT,
> verbose=FALSE)
> 
> The model resulted in the following summary:
> 
> > summary(model3.1)
> 
> 
> 
>  Iterations = 200001:3198001
> 
>  Thinning interval  = 2000
> 
>  Sample size  = 1500
> 
> 
> 
>  DIC: 4295.048
> 
> 
> 
>  G-structure:  ~idh(trait):id
> 
> 
> 
>                  post.mean l-95% CI u-95% CI eff.samp
> 
> baitIntake.id       0.1969   0.0204   0.5101     1500
> 
> zi_baitIntake.id    9.6323   4.3901  17.0756      151
> 
> 
> 
>                ~idh(trait):dive
> 
> 
> 
>                    post.mean  l-95% CI u-95% CI eff.samp
> 
> baitIntake.event       0.2955 1.501e-01   0.4423  1720.47
> 
> zi_baitIntake.event    0.6166 1.873e-06   2.2227    91.55
> 
> 
> 
>  R-structure:  ~idh(trait):units
> 
> 
> 
>                     post.mean l-95% CI u-95% CI eff.samp
> 
> baitIntake.units       0.5085   0.4217   0.6023     1641
> 
> zi_baitIntake.units    1.0000   1.0000   1.0000        0
> 
> 
> 
>  Location effects: bait ~ trait - 1 + trait:individuals + trait:tourists +
> trait:operators + at.level(trait, 1):presence
> 
> 
> 
>                            post.mean  l-95% CI  u-95% CI eff.samp  pMCMC
> 
> baitIntake                  5.800698  5.169356  6.423403   1500.0 <7e-04 ***
> 
> zi_baitIntake              -0.607708 -3.430961  2.148515    719.3 0.6467
> 
> baitIntake:individuals     -0.053430 -0.120854  0.018248   1500.0 0.1347
> 
> zi_baitIntake:individuals   0.070798 -0.207224  0.358939    340.2 0.6307
> 
> baitIntake:tourists         0.054909  0.024764  0.089098   1500.0 <7e-04 ***
> 
> zi_baitIntake:tourists     -0.111844 -0.248268  0.023882    222.7 0.0840 .
> 
> baitIntake:operators        0.119909  0.016259  0.208378   1500.0 0.0253 *
> 
> zi_baitIntake:operators    -0.181109 -0.552908  0.211214    316.1 0.3520
> 
> presence                    0.009881  0.008174  0.011563   1500.0 <7e-04 ***
> 
> ---
> 
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> The new model formula and adjusting the df for only animals that have
> attended the events at least 5 times helped I think. The variance in the
> zi.baitintake.id is a lot smaller now (from 140 in the previous model to 9
> in this model). I think there is still a rather large uncertainty in some
> of the fixed effects and the intercept for each trait but am not sure if
> that would be within the range that is acceptable?
> 
> The autocorrelation diagnostics looked as follows:
> 
> > autocorr.diag(model3.1$Sol)
> 
>             baitIntake zi_baitIntake baitIntake:individuals
> zi_baitIntake:individuals baitIntake:tourists zi_baitIntake:tourists
> 
> Lag 0      1.000000000    1.00000000            1.000000000
> 1.000000000         1.000000000              1.0000000
> 
> Lag 2000   0.026012481    0.26311228            0.011550676
> 0.566133816        -0.006107718              0.5590814
> 
> Lag 10000 -0.036331924    0.01332843            0.003078598
> 0.132021786        -0.023601403              0.1855635
> 
> Lag 20000  0.026546529   -0.01065644            0.005180587
> 0.003356647        -0.006827257              0.0705329
> 
> Lag 1e+05  0.009254395    0.02910151            0.003647482
>  -0.045349648        -0.033278001             -0.0276361
> 
>           baitIntake:operators zi_baitIntake:operators      presence
> 
> Lag 0              1.000000000              1.00000000  1.0000000000
> 
> Lag 2000           0.017853409              0.53269135  0.0309542019
> 
> Lag 10000         -0.001788713              0.15219674  0.0039462749
> 
> Lag 20000         -0.067415168              0.04973919 -0.0084329378
> 
> Lag 1e+05         -0.010173980              0.03747686 -0.0009911261
> 
> 
> 
> > autocorr.diag(model3.1$VCV)
> 
>           baitIntake.id zi_baitIntake.id baitIntake.event
> zi_baitIntake.event baitIntake.units zi_baitIntake.units
> 
> Lag 0       1.000000000      1.000000000      1.000000000
> 1.00000000       1.00000000                 NaN
> 
> Lag 2000    0.010988496      0.279553940     -0.006822039
> 0.78582472      -0.04529223                 NaN
> 
> Lag 10000   0.005850551      0.255547273     -0.020832316
> 0.48146906       0.01942578                 NaN
> 
> Lag 20000   0.006862719      0.175722325     -0.080220078
> 0.29263378      -0.01864084                 NaN
> 
> Lag 1e+05   0.036323891      0.003939182     -0.005860269
>  -0.01147487      -0.01569073                 NaN
> 
> I attached the trace and density plots to this message. I think overall the
> mixing improved. The mixing of the random effects improved compared to the
> previous model but is still looking a bit off when I compare it to the
> mixing of the fixed effects.
> 
> Thank you for your suggestion how to correctly report the prior choice in a
> paper - that is very helpful.
> 
> Kind regards,
> Vital
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 



More information about the R-sig-mixed-models mailing list