[R-sig-ME] Multi-response model to estimate Correlation

Thu Jun 23 22:04:27 CEST 2011

Hi Xavier,

I think it is probably best to use the posterior.cor function so I’m glad
that worked out. 

Regarding the priors I am a bit wary to say too much since I have very
little confidence in my understanding of them. Nonetheless some things I’ve
read suggest that the inverse gamma (set 1) was typically used but that it
has been found to be a bit problematic versus the inverse Wishart (set 2). I
think though that both still get used a fair bit. 

There really isn’t much of a difference in your estimates (0.018) so that
could also just come down to the uniqueness of a particular chain, not
necessarily even sensitivity to priors. It might be useful to run multiple
chains and then some of the MCMC diagnostics (Gelman and Rubin 1992;
gelman.diag{coda}) that are available (though these don’t seem to be used
much in the evolutionary ecology literature). Note that the gelman.diag
function requires that you have overdispersed starting values; i.e. insert
start=list(QUASI=FALSE) into the MCMCglmm code.

Also, since you're specifying G as a matrix with off-diagonal elements equal
to zero your prior is specifying a covariance/correlation of zero to start
with.

If multiple chains give similar estimates from inverse Wishart runs and
satisfy the gelman.diag criteria I would just go with that--cautioning again
that I may not be the best person to listen to!

Good luck,
Ned

--
Ned Dochtermann
Department of Biology
University of Nevada, Reno

ned.dochtermann at gmail.com
http://wolfweb.unr.edu/homepage/mpeacock/Ned.Dochtermann/
http://www.researcherid.com/rid/A-7146-2010
--

From: Xavier Harrison [mailto:xav.harrison at gmail.com] 
Sent: Thursday, June 23, 2011 6:37 AM
To: Ned Dochtermann
Cc: r-sig-mixed-models at r-project.org
Subject: Re: Multi-response model to estimate Correlation

Hi Ned

Many thanks for the 'off the shelf' code', which proved very useful. There
was only a small discrepancy between the point estimate of the correlation
from the posterior mode of the (co)variances (-0.276) versus the explicit
estimate of the posterior mode of the correlation, from your code (-0.292,
95% CI -0.09 - -0.45). However I guess this difference is small when one
takes the span of the CIs into account. Unless someone chimes in crying
heresy I will probably report the estimates from the code you provided.

A further point that has arisen from this, is the differences among
estimates from choice of priors.

#Set 1
multiprior3<-list(R=list(V=diag(2)*1e-6,
nu=1.002),G=list(G1=list(V=diag(2)*1e-6, nu=1.002, alpha.mu=c(0,0),
alpha.V=diag(2)*1000)))

#Set 2: Uninformative Multivariate Wishart from Dochtermann webpage
prior.miw<-list(R=list(V=diag(2), nu=2),G=list(G1=list(V=diag(2),nu=2,
alpha.mu=c(0,0),alpha.V=diag(2)*1000)))

When I use set 1 above, I get the estimate I reported. When using set 2,
taken from your mixed model posting, I get an estimate of -0.31 (95% CI -0.1
- -0.55). I was wondering if 1)This difference in estimates is large enough
that one would be concerned about sensitivity to priors and 2) Whether it
would be acceptable to use the smallest of these estimates (from set 1) as a
conservative estimate of the correlation to report in the paper?

Thanks again

Xav

---------------------------------------------------------
Dr Xavier Harrison
BBSRC Postdoctoral Research Associate
Centre for Ecology and Conservation
University of Exeter
Tremough Campus
Penryn
Cornwall
TR10 9EZ
Tel: (01326) 371872
xav.harrison at gmail.com

On 21 Jun 2011, at 23:53, Ned Dochtermann wrote:

Hi Xavier,

I'm far from an expert (to put it graciously) so it might be best to wait on
some of the heavy hitters but for the correlation you might consider
specifically estimating the posterior mode for the correlation. What you are
doing now is estimating the correlation from the posterior modes of the
covariance and two variances, which is slightly different. Hopefully someone
can weigh in on whether one is actually "better" than the other (as an
aside, it would be nice if you reported the unstandardized covariance and
variance posterior modes in addition to the correlation).

to return the correlation:
      posterior.mode(posterior.cor(multi3prior2yr $VCV[,1:4])[,2])

credibility interval:
      HPDinterval (posterior.cor(multi3prior2yr $VCV[,1:4])[,2])

Presumably the posterior.cor function effectively converts the covariance to
a correlation for each MCMC sample, from which you can then draw the
appropriate values. If your burn-in was sufficient and thinning appropriate
I doubt this makes too much of a difference but it may make a small one.

On the topic of priors, I don't know much and have had a very difficult time
finding approachable material on the topic. The MCMCglmm program seems to
also conflate the shape of the distribution with the "confidence" placed on
the distribution within a single parameter which makes it even harder for me
to follow. I started a thread awhile back listing out some prior
specifications I had gleaned from the listerv:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/006088.html
As you can see I had to back off the topic in the second post due to my
ignorance and unfortunately the topic wasn't picked up by others.

Good luck!
Ned

--
Ned Dochtermann
Department of Biology
University of Nevada, Reno

ned.dochtermann at gmail.com
http://wolfweb.unr.edu/homepage/mpeacock/Ned.Dochtermann/
http://www.researcherid.com/rid/A-7146-2010
--

------------------------------

Message: 3
Date: Tue, 21 Jun 2011 14:19:36 +0100
From: Xavier Harrison <xav.harrison at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Multi-response model to estimate Correlation
	between	Traits using MCMCglmm
Message-ID: <255B0F83-79E0-4338-897D-6BB5956FC653 at gmail.com>
Content-Type: text/plain

Dear List

I'm studying Heterozygosity-Fitness correlations in a migratory bird, and
have used MCMCglmm to fit Poisson models of the form "Reproductive
Success~Heterozygosity". These models run well, and show a positive effect
of heterozygosity on number of offspring produced. However, one reviewer has
asked that I fit a multi-response model where no. of juveniles and
heterozygosity are run as a single response, rather than outcome and
predictor, so that I may estimate the correlation between them. I list my
analysis below because I would greatly appreciate some advice to ensure I've
gone about this the correct way, as I would like this to be watertight when
this goes back out to review.

To do so, I have largely followed the guidelines from the course notes (~ pg
90) and from this post here:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/005694.html

My model is as follows:

multi3prior2yr<-MCMCglmm(cbind(juv,ir)~trait-1,random=~us(trait):tempcode,rc
ov=~us(trait):units,data=yr2,family=c("poisson","gaussian"),verbose=FALSE,pr
ior=multiprior3,burnin=20000,nitt=120000,thin=100)

"tempcode" is the indicator variable referencing members of breeding pairs,
as sometimes both the male and female have been sampled, and will have the
same reproductive success but different heterzygosities. "juv" is number of
offspring and "ir" is heterozygosity, calculated as internal relatedness
following Amos (2001).

The priors I have used for these models are as follows. As far as I can see
the choice of prior doesn't seem to affect the estimates too much, which I
take as a good sign, although when using sets 2 and 3 the estimate
attenuates slightly, which I expected as I read something about how set1 of
priors might actually be fairly informative for this kind of model. I'm not
certain, however, that I've specified the "uninformative" priors correctly.
For these models, autocorr stats all look good (<0.1 between stored
iterations, as suggested by the course notes).

multi.prior<-list(R=list(V=diag(2),nu=2),G=list(G1=list(V=diag(2),nu=2)))

multiprior2<-list(R=list(V=diag(2), nu=2),G=list(G1=list(V=diag(2), nu=2,
alpha.mu=c(0,0), alpha.V=diag(2)*1000)))

multiprior3<-list(R=list(V=diag(2)*1e-6,
nu=1.002),G=list(G1=list(V=diag(2)*1e-6, nu=1.002, alpha.mu=c(0,0),
alpha.V=diag(2)*1000)))

To assess the correlation between outcome variables, I follow the example in
the link above where the vcov matrix is extracted from the model object and
then converted to a correlation using the "cov2cor" function. I've written a
basic function to do this for any model object from this set of models

mc.corr<-function(mcobject){
xx<-matrix(c(posterior.mode(mcobject$VCV)[1],posterior.mode(mcobject$VCV)[2]
,posterior.mode(mcobject$VCV)[3],posterior.mode(mcobject$VCV)[4]),2)
return(cov2cor(xx))
}

This returns a matrix of the form:

mc.corr(multi3prior2yr)
          [,1]       [,2]
[1,]  1.0000000 -0.2760332
[2,] -0.2760332  1.0000000

which I believe to be the correlation between heterozygosity and
reproductive success at the group level (breeding pair) rather than at the
unit level.

This is in agreement with the mixed effects regression where increasing
homozygosity (IR increases) causes a reduction in number of juveniles, so at
least superficially appears correct. My concern is that the nature of my
model parameterisation may be overestimating the degree of correlation.
Taking a step back from that, is the "cov2cor" function an acceptable way of
estimating this correlation?

Any advice regarding model or prior specification is greatly appreciated.

Many thanks

Xavier Harrison

---------------------------------------------------------
Dr Xavier Harrison
BBSRC Postdoctoral Research Associate
Centre for Ecology and Conservation
University of Exeter
Tremough Campus
Penryn
Cornwall
TR10 9EZ
Tel: (01326) 371872
xav.harrison at gmail.com