[R-sig-ME] persistant autocorrelation in binomial MCMCglmm

Aiello, Christina caiello at usgs.gov
Fri May 5 23:03:40 CEST 2017


Hi Jarrod,

I have an update on the model performance with a probit link - the
autocorrelation is much better behaved with the "threshold" family. All ACF
values are <0.1 for iteration and thinning #s that were resulting in
autocorrelation using the logit link.

Looking at the latent variables, though, a lot of the distributions
included values below -7 (lowest was -10). All of the means where within
the -7 to 7 range though, because only a few estimates per observation
tended to reach very low negative values. Are *any* estimates outside the
range considered problematic?

I noticed on the forum a post where someone else had this issue (
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q3/019067.html) and I
also tried the chi square prior you suggested for that problem (V=1,
nu=1000, alpha.mu=0, alpha.V=1) but the result was the same.

In terms of the data and system, I would expect an extremely low, near zero
probability of interaction for some of these dyads because they are not
using similar areas and so are not physically able to interact. Is this
signal perhaps too strong? If my goal is to weed out these improbable
interactions, though, will the model not serve this purpose?

Many thanks,

Christina

Christina M. Aiello
Biologist- U.S. Geological Survey
Las Vegas Field Station
160 N. Stephanie St.
Henderson, NV 89074
(702) 481-3957
caiello at usgs.gov

On Thu, May 4, 2017 at 10:17 AM, Aiello, Christina <caiello at usgs.gov> wrote:

> Hi Jarrod,
>
> Appreciate the quick response and thoughts
>
> 1) I thought I had checked the absolute value of the latent variables, but
> now that I look again, I must not have examined both ends of the
> distribution. There are 167 observations whose latent variable
> distributions dip below -20 (minimum was -32). In each of these cases, the
> left tail of the distribution includes very few estimates at such low
> values. Do you think this has to do with the rarity of a response of 1 in
> the dataset? Or might this be indicative of another problem?
>
> 2) I'll give the probit link a try today and see how the results compare
>
> 3) I can definitely let the chains run longer, and continue increasing
> thinning? I was hesitant to keep upping the values because I haven't seen
> many published analyses using iterations and intervals beyond what I've
> been trying. I was worried having to run the chain so long might be
> indicative of other underlying problems that I wasn't considering.
>
> Many thanks!
> Christina
>
> Christina M. Aiello
> Biologist- U.S. Geological Survey
> Las Vegas Field Station
> 160 N. Stephanie St.
> Henderson, NV 89074
> (702) 481-3957
> caiello at usgs.gov
>
> On Wed, May 3, 2017 at 9:31 PM, Jarrod Hadfield <j.hadfield at ed.ac.uk>
> wrote:
>
>> Hi Christina,
>>
>> 1/ The model syntax looks fine, it is just that MCMCglmm is not very
>> efficient for this type of problem. You say there are no numerical issues
>> because the latent variable is under 20. However, are they commonly under
>> -20?
>>
>> 2/ Centering and scaling the covariates will not effect the mixing
>> because MCMCglmm is block updating all location effects.  Moving from a
>> logistic model (family="categorical") to a probit model
>> (family="threshold") will probably improve mixing, and the inferences will
>> be pretty much the same.
>>
>> 3/ You could also up the number of iterations - but perhaps this takes
>> too much time? Link and Eaton's recommendation not to thin is fine if you
>> are not worried about filling your hard drive. You reduce the Monte Carlo
>> error by saving additional correlated samples, but if the total number of
>> samples you can store is limited you are better storing uncorrelated
>> samples (obtained by thinning) because this reduces the Monte Carlo error
>> more.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>>
>> On 04/05/2017 01:24, Aiello, Christina wrote:
>>
>>> Dear list,
>>>
>>> I'm very new to MCMCglmm but have done my best to read-up on Jarrod
>>> Hadfield's package documents, tutorials and various examples posted
>>> online
>>> and discussed on this forum. I'm having trouble fitting what I thought
>>> was
>>> a fairly simple binomial mixed effects model using MCMCglmm. I'll start
>>> by
>>> describing the data, then the model, then my problem and questions:
>>>
>>> My dataset is comprised of unique dyads - pairs of animals located at one
>>> of four sites (C1, C2, R1, R2). The response variable, 'contact'
>>> indicates
>>> that the dyad did (1) or did not (0) interact over the course of the
>>> study.
>>> The unique id of the members of the dyad are 'tort1' and 'tort2'. Because
>>> individuals appear in multiple dyads, I've included a random effect for
>>> tortID using the multiple membership function available in the package to
>>> account for the non-independence of observations and the fact that some
>>> individuals may have a tendency to contact more than others. For fixed
>>> effects, in this simplified model I only have one categorical variable,
>>> 'site' (which I would have entered as a random effect but I only have 4
>>> levels) and one continuous variable, 'overlap' - which is an estimate of
>>> space-use similarity for each dyad. I centered and scaled this variable
>>> by
>>> the non-zero mean value and standard deviation (though I've also tried
>>> the
>>> model without centering). This may be relevant to my problem: 'overlap's
>>> distribution is highly skewed and mostly zero values - similarly, the
>>> response variable 'contact' is rare and characterized by mostly zeros.
>>>
>>> table(datafi$contact, datafi$site)
>>>>
>>>       C1  C2  R1  R2
>>>    0 241 229 176 181
>>>    1  35  24  14   9
>>>
>>> The model:
>>>
>>> pr<-list( R= list(V=1,  n=0, fix=1), G= list(G1=list(V=1, n=0.002)) )
>>>
>>> m1 <- MCMCglmm(
>>>
>>> fixed = contact ~ (1 + site + overlap ) ,
>>>
>>> random = ~mm(tort1 +tort2),
>>>
>>> data = datafi,
>>>
>>> family = "categorical", verbose = FALSE,
>>>
>>> pr=TRUE, pl=TRUE, prior=pr,
>>>
>>> nitt=4100000 , thin=2000 , burnin= 100000
>>>
>>> )
>>>
>>>> summary(m1)
>>>>
>>>   Iterations = 100001:4098001
>>>   Thinning interval  = 2000
>>>   Sample size  = 2000
>>>
>>>   DIC: 207.4525
>>>
>>>   G-structure:  ~mm(tort1 + tort2)
>>>
>>>              post.mean  l-95% CI u-95% CI eff.samp
>>> tort1+tort2     2.128 0.0002693    5.488    414.6
>>>
>>>   R-structure:  ~units
>>>
>>>        post.mean l-95% CI u-95% CI eff.samp
>>> units         1        1        1        0
>>>
>>>   Location effects: contact ~ (1 + site + overlap)
>>>
>>>              post.mean l-95% CI u-95% CI eff.samp  pMCMC
>>> (Intercept)   -2.2102  -3.6055  -0.7437   1505.6  0.004 **
>>> siteC2        -0.4143  -2.7572   1.4982   1808.4  0.708
>>> siteR1        -1.2543  -4.0794   0.8424   1069.0  0.268
>>> siteR2        -1.4753  -3.9300   0.9782   1348.9  0.205
>>> overlap        3.9025   2.8273   5.1260    488.5 <5e-04 ***
>>>
>>>
>>> As far as I can tell, the chains themselves look good and if I run
>>> multiple
>>> chains and run the Gelman-Rubin diagnostic, the PSRF values are all 1 or
>>> 1.01. The parameter estimates are consistent and make sense. The problem
>>> lies in the autocorrelation - large amounts in the 'overlap' variable and
>>> many of the random intercepts. Here's a sample of the autocorr results:
>>>
>>> sort(diag(autocorr(m1$Sol)[2,,]))
>>>>
>>> ##these are the worst offenders
>>>    (Intercept)    tort1.4534    tort1.3719      tort1.33    tort1.3620
>>>   tort1.30    tort1.3045
>>>   0.0926484964  0.0938622549  0.1009204749  0.1049459123  0.1065261665
>>>   0.1090179501  0.1237642453
>>>         siteR2    tort1.3150    tort1.5579        siteR1    tort1.5473
>>>   tort1.2051    tort1.3092
>>>   0.1339370132  0.1359132027  0.1383816060  0.1506535457  0.1639062068
>>>   0.1682852625  0.1683907054
>>>     tort1.5044     tort1.804    tort1.5141    tort1.5103    tort1.4148
>>>   tort1.4678    tort1.4428
>>>   0.1752670493  0.1767909176  0.1865412328  0.1919929722  0.2257633018
>>>   0.2318115800  0.2521806794
>>>     tort1.3633    tort1.3335    tort1.5101    tort1.3043      tort1.26
>>>   tort1.2014       tort1.6
>>>   0.2577034325  0.2593673083  0.2602145001  0.2717718040  0.3487288823
>>>   0.3748047689  0.4478979043
>>>        overlap
>>>   0.5400325556
>>>
>>>
>>> autocorr.diag(m1$VCV)
>>>>
>>>            tort1+tort2 units
>>> Lag 0      1.00000000   NaN
>>> Lag 2000   0.58292962   NaN
>>> Lag 10000  0.12771910   NaN
>>> Lag 20000  0.05262786   NaN
>>> Lag 1e+05  0.01757316   NaN
>>>
>>> I've attempted to fit the model with both uninformative (shown above) and
>>> parameter expanded priors (
>>> pr2<-list( R= list(V=1, n=0, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu
>>> =0,
>>> alpha.V=1000)) )), with parameter expanded priors performing slightly
>>> worse. I've attempted incrementally larger iteration, thinning, and burn
>>> in
>>> values, increasing the thinning to as high as 2000 with a large burn-in
>>> (100000) in hopes of improving convergence and reducing autocorrelation.
>>> I've tried slice sampling and saw little improvement. Nothing I tried
>>> while
>>> retaining this model structure improved the acfs. I checked the latent
>>> variable estimates and all were under 20, with mean of -5.
>>>
>>> The only way I was able to reduce the autocorrelation was to fit a model
>>> without the random effect, which isn't ideal as I'm ignoring repeated
>>> measures of individuals among dyads. I've read on this forum that random
>>> effects in binomial models are notoriously hard to estimate with this
>>> package and I've also read that one should not just increase thinning to
>>> deal with the problem (MEE 2012 Link & Eaton
>>> <http://onlinelibrary.wiley.com/store/10.1111/j.2041-210X.20
>>> 11.00131.x/asset/j.2041-210X.2011.00131.x.pdf?v=1&t=j29nl332
>>> &s=e0f97f28309122f2bbfa66bccb0cd445696e2f15>).
>>> Interestingly, I have count responses association with all interacting
>>> dyads and I can fit zero truncated models to those responses just fine
>>> with
>>> the same fixed and random effects.
>>>
>>> My questions are then:
>>> 1) Do you think there is something inherently wrong with the data or just
>>> problems with the mixing algorithms in the context of this data?
>>> 2) Are there any other changes to the MCMCglmm specification I might try
>>> to
>>> improve mixing? Or any problems with my current specification?
>>> 3) Any suggestions on where to go from here?
>>>
>>>
>>> I would greatly appreciate any insights and happy to provide further info
>>> as needed!
>>>
>>> Christina
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list