[R-sig-ME] persistant autocorrelation in binomial MCMCglmm

Thu May 4 19:17:00 CEST 2017

Hi Jarrod,

Appreciate the quick response and thoughts

1) I thought I had checked the absolute value of the latent variables, but
now that I look again, I must not have examined both ends of the
distribution. There are 167 observations whose latent variable
distributions dip below -20 (minimum was -32). In each of these cases, the
left tail of the distribution includes very few estimates at such low
values. Do you think this has to do with the rarity of a response of 1 in
the dataset? Or might this be indicative of another problem?

2) I'll give the probit link a try today and see how the results compare

3) I can definitely let the chains run longer, and continue increasing
thinning? I was hesitant to keep upping the values because I haven't seen
many published analyses using iterations and intervals beyond what I've
been trying. I was worried having to run the chain so long might be
indicative of other underlying problems that I wasn't considering.

Many thanks!
Christina

Christina M. Aiello
Biologist- U.S. Geological Survey
Las Vegas Field Station
160 N. Stephanie St.
Henderson, NV 89074
(702) 481-3957
caiello at usgs.gov

On Wed, May 3, 2017 at 9:31 PM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:

> Hi Christina,
>
> 1/ The model syntax looks fine, it is just that MCMCglmm is not very
> efficient for this type of problem. You say there are no numerical issues
> because the latent variable is under 20. However, are they commonly under
> -20?
>
> 2/ Centering and scaling the covariates will not effect the mixing because
> MCMCglmm is block updating all location effects.  Moving from a logistic
> model (family="categorical") to a probit model (family="threshold") will
> probably improve mixing, and the inferences will be pretty much the same.
>
> 3/ You could also up the number of iterations - but perhaps this takes too
> much time? Link and Eaton's recommendation not to thin is fine if you are
> not worried about filling your hard drive. You reduce the Monte Carlo error
> by saving additional correlated samples, but if the total number of samples
> you can store is limited you are better storing uncorrelated samples
> (obtained by thinning) because this reduces the Monte Carlo error more.
>
> Cheers,
>
> Jarrod
>
>
>
>
>
> On 04/05/2017 01:24, Aiello, Christina wrote:
>
>> Dear list,
>>
>> I'm very new to MCMCglmm but have done my best to read-up on Jarrod
>> Hadfield's package documents, tutorials and various examples posted online
>> and discussed on this forum. I'm having trouble fitting what I thought was
>> a fairly simple binomial mixed effects model using MCMCglmm. I'll start by
>> describing the data, then the model, then my problem and questions:
>>
>> My dataset is comprised of unique dyads - pairs of animals located at one
>> of four sites (C1, C2, R1, R2). The response variable, 'contact' indicates
>> that the dyad did (1) or did not (0) interact over the course of the
>> study.
>> The unique id of the members of the dyad are 'tort1' and 'tort2'. Because
>> individuals appear in multiple dyads, I've included a random effect for
>> tortID using the multiple membership function available in the package to
>> account for the non-independence of observations and the fact that some
>> individuals may have a tendency to contact more than others. For fixed
>> effects, in this simplified model I only have one categorical variable,
>> 'site' (which I would have entered as a random effect but I only have 4
>> levels) and one continuous variable, 'overlap' - which is an estimate of
>> space-use similarity for each dyad. I centered and scaled this variable by
>> the non-zero mean value and standard deviation (though I've also tried the
>> model without centering). This may be relevant to my problem: 'overlap's
>> distribution is highly skewed and mostly zero values - similarly, the
>> response variable 'contact' is rare and characterized by mostly zeros.
>>
>> table(datafi$contact, datafi$site)
>>>
>>       C1  C2  R1  R2
>>    0 241 229 176 181
>>    1  35  24  14   9
>>
>> The model:
>>
>> pr<-list( R= list(V=1,  n=0, fix=1), G= list(G1=list(V=1, n=0.002)) )
>>
>> m1 <- MCMCglmm(
>>
>> fixed = contact ~ (1 + site + overlap ) ,
>>
>> random = ~mm(tort1 +tort2),
>>
>> data = datafi,
>>
>> family = "categorical", verbose = FALSE,
>>
>> pr=TRUE, pl=TRUE, prior=pr,
>>
>> nitt=4100000 , thin=2000 , burnin= 100000
>>
>> )
>>
>>> summary(m1)
>>>
>>   Iterations = 100001:4098001
>>   Thinning interval  = 2000
>>   Sample size  = 2000
>>
>>   DIC: 207.4525
>>
>>   G-structure:  ~mm(tort1 + tort2)
>>
>>              post.mean  l-95% CI u-95% CI eff.samp
>> tort1+tort2     2.128 0.0002693    5.488    414.6
>>
>>   R-structure:  ~units
>>
>>        post.mean l-95% CI u-95% CI eff.samp
>> units         1        1        1        0
>>
>>   Location effects: contact ~ (1 + site + overlap)
>>
>>              post.mean l-95% CI u-95% CI eff.samp  pMCMC
>> (Intercept)   -2.2102  -3.6055  -0.7437   1505.6  0.004 **
>> siteC2        -0.4143  -2.7572   1.4982   1808.4  0.708
>> siteR1        -1.2543  -4.0794   0.8424   1069.0  0.268
>> siteR2        -1.4753  -3.9300   0.9782   1348.9  0.205
>> overlap        3.9025   2.8273   5.1260    488.5 <5e-04 ***
>>
>>
>> As far as I can tell, the chains themselves look good and if I run
>> multiple
>> chains and run the Gelman-Rubin diagnostic, the PSRF values are all 1 or
>> 1.01. The parameter estimates are consistent and make sense. The problem
>> lies in the autocorrelation - large amounts in the 'overlap' variable and
>> many of the random intercepts. Here's a sample of the autocorr results:
>>
>> sort(diag(autocorr(m1$Sol)[2,,]))
>>>
>> ##these are the worst offenders
>>    (Intercept)    tort1.4534    tort1.3719      tort1.33    tort1.3620
>>   tort1.30    tort1.3045
>>   0.0926484964  0.0938622549  0.1009204749  0.1049459123  0.1065261665
>>   0.1090179501  0.1237642453
>>         siteR2    tort1.3150    tort1.5579        siteR1    tort1.5473
>>   tort1.2051    tort1.3092
>>   0.1339370132  0.1359132027  0.1383816060  0.1506535457  0.1639062068
>>   0.1682852625  0.1683907054
>>     tort1.5044     tort1.804    tort1.5141    tort1.5103    tort1.4148
>>   tort1.4678    tort1.4428
>>   0.1752670493  0.1767909176  0.1865412328  0.1919929722  0.2257633018
>>   0.2318115800  0.2521806794
>>     tort1.3633    tort1.3335    tort1.5101    tort1.3043      tort1.26
>>   tort1.2014       tort1.6
>>   0.2577034325  0.2593673083  0.2602145001  0.2717718040  0.3487288823
>>   0.3748047689  0.4478979043
>>        overlap
>>   0.5400325556
>>
>>
>> autocorr.diag(m1$VCV)
>>>
>>            tort1+tort2 units
>> Lag 0      1.00000000   NaN
>> Lag 2000   0.58292962   NaN
>> Lag 10000  0.12771910   NaN
>> Lag 20000  0.05262786   NaN
>> Lag 1e+05  0.01757316   NaN
>>
>> I've attempted to fit the model with both uninformative (shown above) and
>> parameter expanded priors (
>> pr2<-list( R= list(V=1, n=0, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu
>> =0,
>> alpha.V=1000)) )), with parameter expanded priors performing slightly
>> worse. I've attempted incrementally larger iteration, thinning, and burn
>> in
>> values, increasing the thinning to as high as 2000 with a large burn-in
>> (100000) in hopes of improving convergence and reducing autocorrelation.
>> I've tried slice sampling and saw little improvement. Nothing I tried
>> while
>> retaining this model structure improved the acfs. I checked the latent
>> variable estimates and all were under 20, with mean of -5.
>>
>> The only way I was able to reduce the autocorrelation was to fit a model
>> without the random effect, which isn't ideal as I'm ignoring repeated
>> measures of individuals among dyads. I've read on this forum that random
>> effects in binomial models are notoriously hard to estimate with this
>> package and I've also read that one should not just increase thinning to
>> deal with the problem (MEE 2012 Link & Eaton
>> <http://onlinelibrary.wiley.com/store/10.1111/j.2041-210X.20
>> 11.00131.x/asset/j.2041-210X.2011.00131.x.pdf?v=1&t=j29nl33
>> 2&s=e0f97f28309122f2bbfa66bccb0cd445696e2f15>).
>> Interestingly, I have count responses association with all interacting
>> dyads and I can fit zero truncated models to those responses just fine
>> with
>> the same fixed and random effects.
>>
>> My questions are then:
>> 1) Do you think there is something inherently wrong with the data or just
>> problems with the mixing algorithms in the context of this data?
>> 2) Are there any other changes to the MCMCglmm specification I might try
>> to
>> improve mixing? Or any problems with my current specification?
>> 3) Any suggestions on where to go from here?
>>
>>
>> I would greatly appreciate any insights and happy to provide further info
>> as needed!
>>
>> Christina
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>

	[[alternative HTML version deleted]]