[R-sig-ME] persistant autocorrelation in binomial MCMCglmm
Aiello, Christina
caiello at usgs.gov
Thu May 4 19:17:00 CEST 2017
Hi Jarrod,
Appreciate the quick response and thoughts
1) I thought I had checked the absolute value of the latent variables, but
now that I look again, I must not have examined both ends of the
distribution. There are 167 observations whose latent variable
distributions dip below -20 (minimum was -32). In each of these cases, the
left tail of the distribution includes very few estimates at such low
values. Do you think this has to do with the rarity of a response of 1 in
the dataset? Or might this be indicative of another problem?
2) I'll give the probit link a try today and see how the results compare
3) I can definitely let the chains run longer, and continue increasing
thinning? I was hesitant to keep upping the values because I haven't seen
many published analyses using iterations and intervals beyond what I've
been trying. I was worried having to run the chain so long might be
indicative of other underlying problems that I wasn't considering.
Many thanks!
Christina
Christina M. Aiello
Biologist- U.S. Geological Survey
Las Vegas Field Station
160 N. Stephanie St.
Henderson, NV 89074
(702) 481-3957
caiello at usgs.gov
On Wed, May 3, 2017 at 9:31 PM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
> Hi Christina,
>
> 1/ The model syntax looks fine, it is just that MCMCglmm is not very
> efficient for this type of problem. You say there are no numerical issues
> because the latent variable is under 20. However, are they commonly under
> -20?
>
> 2/ Centering and scaling the covariates will not effect the mixing because
> MCMCglmm is block updating all location effects. Moving from a logistic
> model (family="categorical") to a probit model (family="threshold") will
> probably improve mixing, and the inferences will be pretty much the same.
>
> 3/ You could also up the number of iterations - but perhaps this takes too
> much time? Link and Eaton's recommendation not to thin is fine if you are
> not worried about filling your hard drive. You reduce the Monte Carlo error
> by saving additional correlated samples, but if the total number of samples
> you can store is limited you are better storing uncorrelated samples
> (obtained by thinning) because this reduces the Monte Carlo error more.
>
> Cheers,
>
> Jarrod
>
>
>
>
>
> On 04/05/2017 01:24, Aiello, Christina wrote:
>
>> Dear list,
>>
>> I'm very new to MCMCglmm but have done my best to read-up on Jarrod
>> Hadfield's package documents, tutorials and various examples posted online
>> and discussed on this forum. I'm having trouble fitting what I thought was
>> a fairly simple binomial mixed effects model using MCMCglmm. I'll start by
>> describing the data, then the model, then my problem and questions:
>>
>> My dataset is comprised of unique dyads - pairs of animals located at one
>> of four sites (C1, C2, R1, R2). The response variable, 'contact' indicates
>> that the dyad did (1) or did not (0) interact over the course of the
>> study.
>> The unique id of the members of the dyad are 'tort1' and 'tort2'. Because
>> individuals appear in multiple dyads, I've included a random effect for
>> tortID using the multiple membership function available in the package to
>> account for the non-independence of observations and the fact that some
>> individuals may have a tendency to contact more than others. For fixed
>> effects, in this simplified model I only have one categorical variable,
>> 'site' (which I would have entered as a random effect but I only have 4
>> levels) and one continuous variable, 'overlap' - which is an estimate of
>> space-use similarity for each dyad. I centered and scaled this variable by
>> the non-zero mean value and standard deviation (though I've also tried the
>> model without centering). This may be relevant to my problem: 'overlap's
>> distribution is highly skewed and mostly zero values - similarly, the
>> response variable 'contact' is rare and characterized by mostly zeros.
>>
>> table(datafi$contact, datafi$site)
>>>
>> C1 C2 R1 R2
>> 0 241 229 176 181
>> 1 35 24 14 9
>>
>> The model:
>>
>> pr<-list( R= list(V=1, n=0, fix=1), G= list(G1=list(V=1, n=0.002)) )
>>
>> m1 <- MCMCglmm(
>>
>> fixed = contact ~ (1 + site + overlap ) ,
>>
>> random = ~mm(tort1 +tort2),
>>
>> data = datafi,
>>
>> family = "categorical", verbose = FALSE,
>>
>> pr=TRUE, pl=TRUE, prior=pr,
>>
>> nitt=4100000 , thin=2000 , burnin= 100000
>>
>> )
>>
>>> summary(m1)
>>>
>> Iterations = 100001:4098001
>> Thinning interval = 2000
>> Sample size = 2000
>>
>> DIC: 207.4525
>>
>> G-structure: ~mm(tort1 + tort2)
>>
>> post.mean l-95% CI u-95% CI eff.samp
>> tort1+tort2 2.128 0.0002693 5.488 414.6
>>
>> R-structure: ~units
>>
>> post.mean l-95% CI u-95% CI eff.samp
>> units 1 1 1 0
>>
>> Location effects: contact ~ (1 + site + overlap)
>>
>> post.mean l-95% CI u-95% CI eff.samp pMCMC
>> (Intercept) -2.2102 -3.6055 -0.7437 1505.6 0.004 **
>> siteC2 -0.4143 -2.7572 1.4982 1808.4 0.708
>> siteR1 -1.2543 -4.0794 0.8424 1069.0 0.268
>> siteR2 -1.4753 -3.9300 0.9782 1348.9 0.205
>> overlap 3.9025 2.8273 5.1260 488.5 <5e-04 ***
>>
>>
>> As far as I can tell, the chains themselves look good and if I run
>> multiple
>> chains and run the Gelman-Rubin diagnostic, the PSRF values are all 1 or
>> 1.01. The parameter estimates are consistent and make sense. The problem
>> lies in the autocorrelation - large amounts in the 'overlap' variable and
>> many of the random intercepts. Here's a sample of the autocorr results:
>>
>> sort(diag(autocorr(m1$Sol)[2,,]))
>>>
>> ##these are the worst offenders
>> (Intercept) tort1.4534 tort1.3719 tort1.33 tort1.3620
>> tort1.30 tort1.3045
>> 0.0926484964 0.0938622549 0.1009204749 0.1049459123 0.1065261665
>> 0.1090179501 0.1237642453
>> siteR2 tort1.3150 tort1.5579 siteR1 tort1.5473
>> tort1.2051 tort1.3092
>> 0.1339370132 0.1359132027 0.1383816060 0.1506535457 0.1639062068
>> 0.1682852625 0.1683907054
>> tort1.5044 tort1.804 tort1.5141 tort1.5103 tort1.4148
>> tort1.4678 tort1.4428
>> 0.1752670493 0.1767909176 0.1865412328 0.1919929722 0.2257633018
>> 0.2318115800 0.2521806794
>> tort1.3633 tort1.3335 tort1.5101 tort1.3043 tort1.26
>> tort1.2014 tort1.6
>> 0.2577034325 0.2593673083 0.2602145001 0.2717718040 0.3487288823
>> 0.3748047689 0.4478979043
>> overlap
>> 0.5400325556
>>
>>
>> autocorr.diag(m1$VCV)
>>>
>> tort1+tort2 units
>> Lag 0 1.00000000 NaN
>> Lag 2000 0.58292962 NaN
>> Lag 10000 0.12771910 NaN
>> Lag 20000 0.05262786 NaN
>> Lag 1e+05 0.01757316 NaN
>>
>> I've attempted to fit the model with both uninformative (shown above) and
>> parameter expanded priors (
>> pr2<-list( R= list(V=1, n=0, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu
>> =0,
>> alpha.V=1000)) )), with parameter expanded priors performing slightly
>> worse. I've attempted incrementally larger iteration, thinning, and burn
>> in
>> values, increasing the thinning to as high as 2000 with a large burn-in
>> (100000) in hopes of improving convergence and reducing autocorrelation.
>> I've tried slice sampling and saw little improvement. Nothing I tried
>> while
>> retaining this model structure improved the acfs. I checked the latent
>> variable estimates and all were under 20, with mean of -5.
>>
>> The only way I was able to reduce the autocorrelation was to fit a model
>> without the random effect, which isn't ideal as I'm ignoring repeated
>> measures of individuals among dyads. I've read on this forum that random
>> effects in binomial models are notoriously hard to estimate with this
>> package and I've also read that one should not just increase thinning to
>> deal with the problem (MEE 2012 Link & Eaton
>> <http://onlinelibrary.wiley.com/store/10.1111/j.2041-210X.20
>> 11.00131.x/asset/j.2041-210X.2011.00131.x.pdf?v=1&t=j29nl33
>> 2&s=e0f97f28309122f2bbfa66bccb0cd445696e2f15>).
>> Interestingly, I have count responses association with all interacting
>> dyads and I can fit zero truncated models to those responses just fine
>> with
>> the same fixed and random effects.
>>
>> My questions are then:
>> 1) Do you think there is something inherently wrong with the data or just
>> problems with the mixing algorithms in the context of this data?
>> 2) Are there any other changes to the MCMCglmm specification I might try
>> to
>> improve mixing? Or any problems with my current specification?
>> 3) Any suggestions on where to go from here?
>>
>>
>> I would greatly appreciate any insights and happy to provide further info
>> as needed!
>>
>> Christina
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list