[R-sig-ME] Phylogenetic Logistic Regression + MCMCglmm
Jarrod Hadfield
j.hadfield at ed.ac.uk
Tue Sep 4 11:50:21 CEST 2012
Hi,
Quoting Joanna L Baker <J.L.Baker at 2008.hull.ac.uk> on Mon, 3 Sep 2012
17:10:04 +0100:
> Dear Jarrod,
>
> Thank you very much for your response. Unfortunately I still have issues
> with convergence even when the residual variance is fixed. Part of the
> problem I was having was the specification for visualizing the posterior
> distributions of the phylogenetic portion of the analysis.
>
> This may be a silly question, but what is the difference between:
>
> plot(my2$VCV[,1])
This is the variance of the phylogenetic effects at the tips (if the
tree is scaled and ultrametric) on the latent scale.
>
> plot(my2$VCV[,1]/(rowSums(my2$VCV)+pi^2/3)).
my2$VCV[,1] depends on your choice of R$V. The above is the
intra-class correlation on the latent scale and does not depend on R.
pi^2/3 is the variance of the logistic distribution. Think of the
model in this way:
latent = mu + u + e + e'
where e is the normally distributed residual with variance R$V, and e'
is an additional residual distributed as logistic (if logit link
used), or normal (if probit link used) or extreme value (if
complementary log-log link used). The variance of e' with logit-link
is pi^2/3.
Note that the outcome is not stochastic when e' is conditioned on. In
this case if latent<0 then the outcome is 0 with probability 1.
>
> The two produce very obviously different graphs, and I was wondering
> what exactly the second part of the latter plot is related to?
>
> Even looking at the correct graph for the phylogenetic part of the
> analysis, it seems I am having trouble with 'animal' converging, and
> have massive issues with the variance. I have attached the posterior
> distributions for both your example (PhyLogRegSIM.pdf) and my own data
> (PhyLogRegDAT.pdf) to illustrate the problem. This occurs even when
> running the models for a longer time.
I would say that the phylogenetic signal in your data is high and/or
sample sizes are small and/or one outcome is rare (you need v. large
sample sizes with binary data). The chain is getting "stuck" at H2
(or lambda) =1. You could run it longer, but I suspect you are hitting
numerical problems (|latent variable| exceeding 20). One option is to
move to probit link (family="ordinal") and use the chi square prior
(see http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/8531)
which down weights H2 values very close to 0 or 1 compared to some
other priors. If the absolute value of the latent variable stays below
7, you should be OK. If not, you will probably have to move to another
piece of software which can deal with extreme predicted probabilities
more elegantly. I have used truncated latent variables in my own work,
but I'm not convinced they are a general or robust solution so have
not implemented them in the release version.
Cheers,
Jarrod
>
> I much appreciate your time and advice on this matter.
>
> Thanks,
> Joanna Baker
>
>
> Below is the code for my model specification:
>
> prior1=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0,
> alpha.V=1000)))
>
> myAinv2<-inverseA(mytree)$Ainv
> my2<-MCMCglmm(DV~IV, random=~animal, ginverse=list(animal=myAinv2),
> family="categorical", prior=prior1, data=mydata,
> nitt=10000000,thin=1000,burnin=100000)
>
> plot(my2$Sol)
> plot(my2$VCV[,1]/(rowSums(my2$VCV)+pi^2/3))
>
>
> c2<-((16*sqrt(3))/(15*pi))^2
> my2.int<-my2$Sol/(sqrt(1+c2))
> my2.int
> posterior.mode(my2.int)
> summary(my2.int)
>
> -----Original Message-----
> From: Jarrod Hadfield [mailto:j.hadfield at ed.ac.uk]
> Sent: 01 September 2012 20:06
> To: Joanna L Baker
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Phylogenetic Logistic Regression + MCMCglmm
>
> Hi,
>
> The most likely reason is that you have not fixed the non-identified
> residual variance, although there are other possibilities. Here is an
> example with a simulated tree and binary data:
>
> tree<-rcoal(100)
> x<-rnorm(100)
> l<-rbv(tree, 1, nodes="TIPS")+x+rnorm(100) y<-rbinom(100, 1, plogis(l))
>
> dat<-data.frame(y=y, x=x, species=tree$tip.label)
>
> prior1=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0,
> alpha.V=1000)))
>
> # residual variance fixed at 1.
>
>
> Ainv<-inverseA(tree)$Ainv
> m1<-MCMCglmm(y~x, random=~species, ginverse=list(species=Ainv),
> family="categorical", prior=prior1, data=dat)
>
> plot(m1$Sol)
> # fixed effects should be zero and one
>
> plot(m1$VCV[,1]/(rowSums(m1$VCV)+pi^2/3))
>
> # phylogenetic ICC should be 1/(2+pi^2/3)=0.189. Note wdie credible
> intervals - you need v.large phylogenies to get precise estimates with
> binary data.
>
> Cheers,
>
> Jarrod
>
>
>
>
>
>
> Quoting Joanna L Baker <J.L.Baker at 2008.hull.ac.uk> on Fri, 31 Aug 2012
> 11:39:08 +0100:
>
>> Hello all,
>>
>> I am currently trying to use MCMCglmm to carry out a phylogenetic
>> logistic regression, with a single binary response and one or more
>> continuous or categorical predictors. However, I am having issues with
>
>> the phylogenetic component, 'animal', converging. This occurs when I
>> use different datasets, with high, moderate and low phylogenetic
> signal.
>>
>> I have tried improper (with the expected issues), proper, and
>> alpha-expanded priors but these do not seem to have much effect on the
>
>> results. I should note that I have no issues with convergence when my
>> response is a continuous variable.
>>
>> Does anybody have a worked example with data and tree that I could use
>
>> to get started with phylogenetic logistic regression or could somebody
>
>> point me in the direction of any published work that has used the
>> package for such an analysis successfully? I'll much appreciate any
>> help with this.
>>
>> Thanks again,
>>
>> Joanna Baker
>>
>> University of Hull
>>
>> Cottingham Road
>>
>> Hull
>>
>> East Yorkshire
>>
>> j.l.baker at 2008.hull.ac.uk
>>
>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list