[R-sig-ME] Poor mixing and autocorrelation of ZIP data in MCMCglmm

Wed Jun 17 19:26:08 CEST 2020

   This is a very superficial comment that doesn't really engage with 
the details of your struggle, but in general it makes me sad when people 
drop observations, especially from mixed models, and especially in a 
Bayesian context.  In principle (I know it doesn't always work out this 
way in practice), observations that are only weakly informative are 
still giving you *some* information ... is there any way you can solve 
the problem by setting stronger (i.e. more strongly regularizing) priors ?

On 6/17/20 5:02 AM, Vital Heim wrote:
> Dear Pierre,
>
> Thank you again for the answer and the new prior!
>
> I have been working on the model last week and your question if I have
> enough data got me thinking. I therefore reconsidered my available data and
> also looked into if I can simplify my model.
>
> The model I submitted before had 378 observations. The observations came
> from 28 different individuals. However, I think one problem could have been
> that most of these individuals only had 1 observation because they only
> appeared at a provisioning event once. I therefore decided to remove all
> the individuals that have less than 5 total observations from the data set
> so that theoretically every animal had at least 5 opportunities to make a
> decision to attend (and for how long) a provisioning event or not. That
> left me with 13 individuals and a total of 405 observations (as I do not
> have temperature as a fixed effect I could keep the rows with temperature
> == NA, resulting in more observations than before). The nr. observations
> per individual are:
>
> id   12 13 14 16 17 19 20 21 34 38 39 40 42
> #obs 27 54 67 52 44 34 26 35 41  7  7  5  6
>
> Trying to simplify my model had me removing two fixed effects (temperature
> and gender) from my model formula. That left me with the model:
>
> priorP2 <- list(R = list(V = diag(2), nu = 2, fix = 2),
>                  G = list(
>                  G1 = list(V = diag(2), nu = 1000, alpha.mu = c(0,0),
> alpha.V = diag(2))
>                 ,G2 = list(V = diag(2), nu = 1000, alpha.mu = c(0,0),
> alpha.V = diag(2))))
>
> NITT <- 320000
> BURN <- 20000
> THIN <- 200
> MULT <- 10
>
> model3.1 <- MCMCglmm(bait ~ trait - 1
>                       + trait:individuals + trait:tourists + trait:operators
> + at.level(trait,1):presence
>                       , random = ~idh(trait):id + idh(trait):event
>                       , rcov =~ idh(trait):units
>                       , data=df, family="zipoisson", prior=priorP2
>                       , nitt=NITT*MULT, burnin=BURN*MULT, thin = THIN*MULT,
> verbose=FALSE)
>
> The model resulted in the following summary:
>
>> summary(model3.1)
>
>
>   Iterations = 200001:3198001
>
>   Thinning interval  = 2000
>
>   Sample size  = 1500
>
>
>
>   DIC: 4295.048
>
>
>
>   G-structure:  ~idh(trait):id
>
>
>
>                   post.mean l-95% CI u-95% CI eff.samp
>
> baitIntake.id       0.1969   0.0204   0.5101     1500
>
> zi_baitIntake.id    9.6323   4.3901  17.0756      151
>
>
>
>                 ~idh(trait):dive
>
>
>
>                     post.mean  l-95% CI u-95% CI eff.samp
>
> baitIntake.event       0.2955 1.501e-01   0.4423  1720.47
>
> zi_baitIntake.event    0.6166 1.873e-06   2.2227    91.55
>
>
>
>   R-structure:  ~idh(trait):units
>
>
>
>                      post.mean l-95% CI u-95% CI eff.samp
>
> baitIntake.units       0.5085   0.4217   0.6023     1641
>
> zi_baitIntake.units    1.0000   1.0000   1.0000        0
>
>
>
>   Location effects: bait ~ trait - 1 + trait:individuals + trait:tourists +
> trait:operators + at.level(trait, 1):presence
>
>
>
>                             post.mean  l-95% CI  u-95% CI eff.samp  pMCMC
>
> baitIntake                  5.800698  5.169356  6.423403   1500.0 <7e-04 ***
>
> zi_baitIntake              -0.607708 -3.430961  2.148515    719.3 0.6467
>
> baitIntake:individuals     -0.053430 -0.120854  0.018248   1500.0 0.1347
>
> zi_baitIntake:individuals   0.070798 -0.207224  0.358939    340.2 0.6307
>
> baitIntake:tourists         0.054909  0.024764  0.089098   1500.0 <7e-04 ***
>
> zi_baitIntake:tourists     -0.111844 -0.248268  0.023882    222.7 0.0840 .
>
> baitIntake:operators        0.119909  0.016259  0.208378   1500.0 0.0253 *
>
> zi_baitIntake:operators    -0.181109 -0.552908  0.211214    316.1 0.3520
>
> presence                    0.009881  0.008174  0.011563   1500.0 <7e-04 ***
>
> ---
>
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> The new model formula and adjusting the df for only animals that have
> attended the events at least 5 times helped I think. The variance in the
> zi.baitintake.id is a lot smaller now (from 140 in the previous model to 9
> in this model). I think there is still a rather large uncertainty in some
> of the fixed effects and the intercept for each trait but am not sure if
> that would be within the range that is acceptable?
>
> The autocorrelation diagnostics looked as follows:
>
>> autocorr.diag(model3.1$Sol)
>              baitIntake zi_baitIntake baitIntake:individuals
> zi_baitIntake:individuals baitIntake:tourists zi_baitIntake:tourists
>
> Lag 0      1.000000000    1.00000000            1.000000000
> 1.000000000         1.000000000              1.0000000
>
> Lag 2000   0.026012481    0.26311228            0.011550676
> 0.566133816        -0.006107718              0.5590814
>
> Lag 10000 -0.036331924    0.01332843            0.003078598
> 0.132021786        -0.023601403              0.1855635
>
> Lag 20000  0.026546529   -0.01065644            0.005180587
> 0.003356647        -0.006827257              0.0705329
>
> Lag 1e+05  0.009254395    0.02910151            0.003647482
>   -0.045349648        -0.033278001             -0.0276361
>
>            baitIntake:operators zi_baitIntake:operators      presence
>
> Lag 0              1.000000000              1.00000000  1.0000000000
>
> Lag 2000           0.017853409              0.53269135  0.0309542019
>
> Lag 10000         -0.001788713              0.15219674  0.0039462749
>
> Lag 20000         -0.067415168              0.04973919 -0.0084329378
>
> Lag 1e+05         -0.010173980              0.03747686 -0.0009911261
>
>
>
>> autocorr.diag(model3.1$VCV)
>            baitIntake.id zi_baitIntake.id baitIntake.event
> zi_baitIntake.event baitIntake.units zi_baitIntake.units
>
> Lag 0       1.000000000      1.000000000      1.000000000
> 1.00000000       1.00000000                 NaN
>
> Lag 2000    0.010988496      0.279553940     -0.006822039
> 0.78582472      -0.04529223                 NaN
>
> Lag 10000   0.005850551      0.255547273     -0.020832316
> 0.48146906       0.01942578                 NaN
>
> Lag 20000   0.006862719      0.175722325     -0.080220078
> 0.29263378      -0.01864084                 NaN
>
> Lag 1e+05   0.036323891      0.003939182     -0.005860269
>   -0.01147487      -0.01569073                 NaN
>
> I attached the trace and density plots to this message. I think overall the
> mixing improved. The mixing of the random effects improved compared to the
> previous model but is still looking a bit off when I compare it to the
> mixing of the fixed effects.
>
> Thank you for your suggestion how to correctly report the prior choice in a
> paper - that is very helpful.
>
> Kind regards,
> Vital
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models