[R-sig-ME] A zero inflated Poisson model in MCMCglmm
Jarrod Hadfield
j.hadfield at ed.ac.uk
Thu Jul 12 15:13:09 CEST 2012
Hi,
Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 11 Jul 2012 11:58:50 +0100:
> Dear Jarrod and David,
>
> Thank you both very much for your responses: I'm sorry it's taken me
> so long to say that, but I wanted to have a proper go at everything
> before I replied, which took longer than I anticipated!
>
> Thank you especially for your zipoisson code, Jarrod. When I
> initially attempted to use your prior and model MCMCglmm asked me
> for a stronger prior, so I changed it to
>
>
> Prior1=list(R=list(V=diag(2),nu=1, fix=2), G=list(G1=G1)) : is that
> a reasonable thing to have done? The model
>
>
I think it is better to diagnose the problem rather than upping nu. In
this case I think the residual variance of the Poisson process must be
going to zero under the flat prior? Perhaps the counts are
under-dispersed with respect to the Poisson, further motivating an
ordinal model?
model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=Prior1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>
> ran with this prior, however I admit I am now flummoxed as to how to
> proceed and go about estimating heritability.
>
>
> For this reason, (and because you both recommended it!), I went on
> to try an ordinal model using the following code:
>
> p.var<-var(data$Toes,na.rm=TRUE)
>
> prior2<-list(G=list(G1=list(V=matrix(p.var/2),n=1)),
> R=list(V=matrix(p.var/2),n=1))
>
>
IMPORTANT: The residual variance is not identifiable in an ordinal
model so all the information is coming from the prior. This is OK, but
I would fix the variance at 1 so as to avoid interpretation
difficulties (and numerical problems - check the range of the latent
variables using pl=TRUE).
model2<-MCMCglmm(Toes~1,random=~animal,family="ordinal",pedigree=ped,data=data,prior=prior2,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>
>
> followed by the same model but with the prior from the Wilson paper erratum:
>
> prior2.1 <- list(G = list(G1 = list(V = 1, nu = 0.002)), R = list(V
> = 1,nu = 0.002))
>
>
> the plots for these models look ok, and they both give a
> heritability estimate of ~0.55, with very similar 95% HDP intervals,
> although the auto correlation is quite high ~0.6
Is this calculated as Vanimal/(Vanimal+Vresidual+1) or
Vanimal/(Vanimal+Vresidual)?
>
>
> Next I tried a categorical model, treating Toes as a binary response
> variable (deforemed or not deformed), using
>
> p.var<-var(data$ToesBinary,na.rm=TRUE)
>
> prior3= list(R = list(V = 1, n = 0, fix = 1), G = list(G1 = list(V =
> 1,n = 0)))
>
> model3<-
> MCMCglmm(ToesBinary~1,random=~animal,family="categorical",data=data,pedigree=ped,prior=prior3,nitt=200000,thin=250,burnin=100000,verbose =
> FALSE)
>
>
> Heritability came out higher with this model at about ~0.85.
Is this calculated as Vanimal/(Vanimal+Vresidual+pi^2/3) or
Vanimal/(Vanimal+Vresidual)?
>
>
> And then last but not least a multinomial2 model:
>
> prior4=list(R=list(V=1,nu=0.002),G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=100)))
>
> model4<-MCMCglmm(cbind(Deformed,Fine)~1,random=~animal,family="multinomial2",data=data,pedigree=ped,prior=prior4,nitt=200000,thin=250,burnin=100000,verbose=FALSE)
>
> The variance plots for this one didn't look very pretty and the
> autocorrelation values were still quite high, ~0.8, but it gave me a
> pretty heritability plot and an estimate similar to my ordinal models.
>
>
> As you can probably gather from all of this, I am something of an
> indecisive person...., but at this stage I am inlcined to stick with
> the ordinal models and believe that there is a heritable element to
> the number of deformed toes (which seems rather intuitive after all
> that!). Given this decision, can you recommend anything I might do
> to reduce the autocorrelation in these models. Also, do I have to
> run the same model many times to confirm its conclusions?
Re: autocorrelation. Use parameter expansion and/or run for longer
(particularly for binary animal models and/or if there is little
overdispersion) If the absolute value of the latent variables under
the logit link exceed 20, or under the probit link exceed 7, then at
the moment I would consider another program unless the latent
variables are associated with particular levels of a fixed effect.
Then flatish priors on the probability scale may overcome the problem.
>
>
> Finally finally, I will have to do another zipoisson model in the
> future, can you direct me to where I might find an example of the
> code used for estimating heritability in these cases please?
I think it makes most sense to think about the heritabilities of each
process separately - the first as a Poisson trait and the second as
a threshold (binary) trait.
>
>
> Many many thanks once more,
> Helen
>
>
>
>
> On 08/07/2012 12:03, Jarrod Hadfield wrote:
>> Hi,
>>
>> I agree with David that an ordinal model may behave better for this
>> type of data (or even a 0/1 response with deformed or not) but in
>> case you want to pursue a ZIP model...
>>
>> The model you have specified is probably not what you had in mind.
>> You should think of the ZIP response as being two responses
>> (traits): the Poisson counts (trait 1) and the probability that a
>> zero comes from the zero-inflation process (trait 2). At the moment
>> you have a common intercept for both and therefore assume that the
>> probability that a zero is from the zero-inflation process (on the
>> logit scale) is equal to the mean Poisson count (on the log scale).
>> There is probably little justification for this (but see ZAP
>> models). Moreover, you have a single animal term, and therefore
>> assume that both processes have the same genetic variance and that
>> the genetic correlation between them is 1. You also assume that the
>> residual variance for each process is equal (but the residual
>> correlation is 0), although this is not actually that bigger deal
>> because residual variation in the zero-inflation process (and
>> residual corrletaion) is not identifiable from the data and so the
>> value it could take is essentially arbitrary. However, I like to
>> recognise this non-identifiability by fixing the variance at 1 and
>> the covariance at zero (but see ZAP models where your specification
>> is useful).
>>
>> So, a more appropriate model would be:
>>
>> model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) or
>> perhaps
>>
>> model1.2<-MCMCglmm(Toes~trait,random=~idh(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) The difference between these two is that the genetic correlation between the zero-inflation and the Poisson processes is estimated, and in the latter it is set to
>> 0.
>>
>> I'm not sure about how much data you have, but I would think that
>> if the incidence of deformed toes is very low, you probably have
>> little information for some of the parameters. In this case you
>> have to be VERY careful with priors. The prior recommendations in
>> the MCMCglmm tutorial in Wilson et al. 2011 will generally be quite
>> informative for small or even moderate sized datasets (the authors
>> have kindly published an erratum as many people seem to be using
>> their recommendation without realising its implications).
>>
>> The prior could look like:
>>
>> prior=list(R=list(V=diag(2), nu=0, fix=2), G=list(G1=G1))
>>
>> which fixes the residual variance in the zero-inflation process to
>> 1 with a flat improper prior on the Poisson over-dispersion. For
>> the prior on the genetic (co)variances (G1) I generally use
>> parameter expanded priors of the form:
>>
>> G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>
>> or
>>
>> G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>
>> for model.2.
>>
>> These generally give posterior modes for the variance components
>> that are close to (RE)ML estimates, although it is unclear to me
>> how they behave for the covariances and functions of variances
>> (e.g. heritability). At least for binary traits (e.g. the
>> zero-inflation bit) I have seen a chi-square prior recommended
>> because of its prior properties on the heritability (de
>> Villemereuil in MEE). This type of prior can also be obtained using
>> parameter expansion:
>>
>> (V=1, nu=1000, alpha.mu=0, alpha.V=1)
>>
>> and I would guess that the multivariate analogue would be
>> (V=diag(2), nu=1001, alpha.mu=c(0,0), alpha.V=diag(2)). However, a
>> chi-square prior probably has poor properties for the Poisson
>> variance and in MCMCglmm you could only mix these different priors
>> for model.2 using a different syntax:
>>
>> model1.2<-MCMCglmm(Toes~trait,random=~idh(at(trait,1)):animal+idh(at(trait,2)):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>> G=list(
>> G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000),
>> G2=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)
>> )
>>
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>> Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 04 Jul 2012 16:24:36 +0100:
>>
>>> Dear list,
>>>
>>> I work on bats, some of which have deformed toes. I have just started
>>> trying to use MCMCglmm to investigate whether there is hertiable
>>> variation for the number of deformed toes a bat has. Since most bats
>>> have 0 deformed toes I am using a zero-inflated Poisson model.
>>>
>>> I have put together some code (see below) based on the MCMCglmm tutorial
>>> from the Wilson et al. 2011 'Ecologist's guide to the animal model'
>>> paper, the MCMCglmm Course Notes and trial and error. This code runs and
>>> I get a model and a heritability estimate. However, the heritability
>>> estimate I get is extremely high and this, teamed with the observations
>>> that my posterior distributions of the variance componants are not
>>> pretty as the example ones I've seen and my autocorrelation values are
>>> high (~0.4), makes me suspicious that I am not modelling this trait very
>>> well. I suspect I have over simplified the model. I based my prior on a
>>> model done in ASReml modelling 'Toes' as a Poisson distribution, which
>>> can an heritability estimate of 0.35.
>>>
>>> In addition, when I try and add a second random variable - year of birth
>>> (Born) - into a second model, both plots and auto correlation get worse
>>> and my heritability estimate for number of deformed toes (Toes) goes
>>> from about 0.8 to 0.05!
>>>
>>> Finally, I suspect you're not, but I would like to know if you are
>>> allowed to compare DIC values from models with different priors please.
>>>
>>> I have pasted my code below. If anyone can offer constructive criticism
>>> it would be much appreciated.
>>>
>>> Many thanks,
>>> Helen
>>>
>>> Model 1: With number of deformed toes (Toes) as the only random variable
>>>
>>>
>>> p.var<-var(data$Toes,na.rm=TRUE)
>>> prior1.1<-list(G=list(G1=list(V=matrix(p.var*0.35),n=1)),
>>> R=list(V=matrix(p.var*0.65),n=1))
>>>
>>> model1.1<-MCMCglmm(Toes~1,random=~animal,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) posterior.heritability1.1<-model1.1$VCV[,"animal"]/(model1.1$VCV[,"animal"]+model1.1$VCV[,"trait:units"])
>>> HPDinterval(posterior.heritability1.1,0.95)
>>> posterior.mode(posterior.heritability1.1)
>>>
>>>
>>> Model 2: With number of toes (Toes) and year of birth (Born) as random
>>> variables
>>>
>>>
>>> p.var<-var(data$Toes,na.rm=TRUE)
>>> prior1.2<-list(G=list(G1=list(V=matrix(p.var/3),n=1),G2=list(V=matrix(p.var/3),n=1)),R=list(V=matrix(p.var/3),n=1))
>>> model1.2<-
>>> MCMCglmm(Toes~1,random=~animal+Born,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.2,nitt=1000000,thin=1000,burnin=500000,verbose=FALSE) posterior.heritability1.2<-model1.2$VCV[,"animal"]/(model1.2$VCV[,"animal"]+model1.2$VCV[,"Born"]+model1.2$VCV[,"trait:units"])
>>> HPDinterval(posterior.heritability1.2,0.95)
>>> posterior.mode(posterior.heritability1.2)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>
>>
>
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list