[R-sig-ME] A zero inflated Poisson model in MCMCglmm

Jarrod Hadfield j.hadfield at ed.ac.uk
Thu Jul 12 15:13:09 CEST 2012


Hi,

Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 11 Jul 2012 11:58:50 +0100:

> Dear Jarrod and David,
>
> Thank you both very much for your responses: I'm sorry it's taken me  
> so long to say that, but I wanted to have a proper go at everything  
> before I replied, which took longer than I anticipated!
>
> Thank you especially for your zipoisson code, Jarrod. When I  
> initially attempted to use your prior and model MCMCglmm asked me  
> for a stronger prior, so I changed it to
>
>
> Prior1=list(R=list(V=diag(2),nu=1, fix=2), G=list(G1=G1)) : is that  
> a reasonable thing to have done? The model
>
>
I think it is better to diagnose the problem rather than upping nu. In  
this case I think the residual variance of the Poisson process must be  
going to zero under the flat prior? Perhaps the counts are  
under-dispersed with respect to the Poisson, further motivating an  
ordinal model?



model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=Prior1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>
> ran with this prior, however I admit I am now flummoxed as to how to  
> proceed and go about estimating heritability.
>
>
> For this reason, (and because you both recommended it!), I went on  
> to try an ordinal model using the following code:
>
> p.var<-var(data$Toes,na.rm=TRUE)
>
> prior2<-list(G=list(G1=list(V=matrix(p.var/2),n=1)),  
> R=list(V=matrix(p.var/2),n=1))
>
>
IMPORTANT: The residual variance is not identifiable in an ordinal  
model so all the information is coming from the prior. This is OK, but  
I would fix the variance at 1 so as to avoid interpretation  
difficulties (and numerical problems - check the range of the latent  
variables using pl=TRUE).


model2<-MCMCglmm(Toes~1,random=~animal,family="ordinal",pedigree=ped,data=data,prior=prior2,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>
>
> followed by the same model but with the prior from the Wilson paper erratum:
>
> prior2.1 <- list(G = list(G1 = list(V = 1, nu = 0.002)), R = list(V  
> = 1,nu = 0.002))
>
>
> the plots for these models look ok, and they both give a  
> heritability estimate of ~0.55, with very similar 95% HDP intervals,  
> although the auto correlation is quite high ~0.6

Is this calculated as Vanimal/(Vanimal+Vresidual+1)  or  
Vanimal/(Vanimal+Vresidual)?

>
>
> Next I tried a categorical model, treating Toes as a binary response  
> variable (deforemed or not deformed), using
>
> p.var<-var(data$ToesBinary,na.rm=TRUE)
>
> prior3= list(R = list(V = 1, n = 0, fix = 1), G = list(G1 = list(V =  
> 1,n = 0)))
>
> model3<-  
> MCMCglmm(ToesBinary~1,random=~animal,family="categorical",data=data,pedigree=ped,prior=prior3,nitt=200000,thin=250,burnin=100000,verbose =  
> FALSE)
>
>
> Heritability came out higher with this model at about ~0.85.

Is this calculated as Vanimal/(Vanimal+Vresidual+pi^2/3)  or  
Vanimal/(Vanimal+Vresidual)?
>
>
> And then last but not least a multinomial2 model:
>
> prior4=list(R=list(V=1,nu=0.002),G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=100)))
>
> model4<-MCMCglmm(cbind(Deformed,Fine)~1,random=~animal,family="multinomial2",data=data,pedigree=ped,prior=prior4,nitt=200000,thin=250,burnin=100000,verbose=FALSE)
>
> The variance plots for this one didn't look very pretty and the  
> autocorrelation values were still quite high, ~0.8, but it gave me a  
> pretty heritability plot and an estimate similar to my ordinal models.
>
>
> As you can probably gather from all of this, I am something of an  
> indecisive person...., but at this stage I am inlcined to stick with  
> the ordinal models and believe that there is a heritable element to  
> the number of deformed toes (which seems rather intuitive after all  
> that!). Given this decision, can you recommend anything I might do  
> to reduce the autocorrelation in these models. Also, do I have to  
> run the same model many times to confirm its conclusions?

Re: autocorrelation. Use parameter expansion and/or run for longer  
(particularly for binary animal models and/or if there is little  
overdispersion)  If the absolute value of the latent variables under  
the logit link exceed 20, or under the probit link exceed 7, then at  
the moment I would consider another program unless the latent  
variables are associated with particular levels of a fixed effect.   
Then flatish priors on the probability scale may overcome the problem.


>
>
> Finally finally, I will have to do another zipoisson model in the  
> future, can you direct me to where I might find an example of the  
> code used for estimating heritability in these cases please?

I think it makes most sense to think about the heritabilities of each  
process separately   - the first as a Poisson trait and the second as  
a threshold (binary) trait.

>
>
> Many many thanks once more,
> Helen
>
>
>
>
> On 08/07/2012 12:03, Jarrod Hadfield wrote:
>> Hi,
>>
>> I agree with David that an ordinal model may behave better for this  
>> type of data (or even a 0/1 response with deformed or not) but in  
>> case you want to pursue a ZIP model...
>>
>> The model you have specified is probably not what you had in mind.  
>> You should think of the ZIP response as being two responses  
>> (traits): the Poisson counts (trait 1) and the probability that a  
>> zero comes from the zero-inflation process (trait 2). At the moment  
>> you have a common intercept for both and therefore assume that the  
>> probability that a zero is from the zero-inflation process (on the  
>> logit scale) is equal to the mean Poisson count (on the log scale).  
>> There is probably little justification for this (but see ZAP  
>> models). Moreover, you have a single animal term, and therefore  
>> assume that both processes have the same genetic variance and that  
>> the genetic correlation between them is 1. You also assume that the  
>> residual variance for each process is equal (but the residual  
>> correlation is 0), although this is not actually that bigger deal  
>> because residual variation in the zero-inflation process (and  
>> residual corrletaion) is not identifiable from the data and so the  
>> value it could take is essentially arbitrary. However, I like to  
>> recognise this non-identifiability by fixing the variance at 1 and  
>> the covariance at zero (but see ZAP models where your specification  
>> is useful).
>>
>> So, a more appropriate model would be:
>>
>> model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) or  
>> perhaps
>>
>> model1.2<-MCMCglmm(Toes~trait,random=~idh(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) The difference between these two is that the genetic correlation between the zero-inflation and the Poisson processes is estimated, and in the latter it is set to  
>> 0.
>>
>> I'm not sure about how much data you have, but I would think that  
>> if the incidence of deformed toes is very low, you probably have  
>> little information for some of the parameters. In this case you  
>> have to be VERY careful with priors. The prior recommendations in  
>> the MCMCglmm tutorial in Wilson et al. 2011 will generally be quite  
>> informative for small or even moderate sized datasets (the authors  
>> have kindly published an erratum as many people seem to be using  
>> their recommendation without realising its implications).
>>
>> The prior could look like:
>>
>> prior=list(R=list(V=diag(2), nu=0, fix=2), G=list(G1=G1))
>>
>> which fixes the residual variance in the zero-inflation process to  
>> 1 with a flat improper prior on the Poisson over-dispersion. For  
>> the prior on the genetic (co)variances (G1) I generally use  
>> parameter expanded priors of the form:
>>
>> G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>
>> or
>>
>> G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>
>> for model.2.
>>
>> These generally give posterior modes for the variance components  
>> that are close to (RE)ML estimates, although it is unclear to me  
>> how they behave for the covariances and functions of variances  
>> (e.g. heritability). At least for binary traits (e.g. the  
>> zero-inflation bit) I have seen a chi-square prior recommended  
>> because of its  prior properties on the heritability (de  
>> Villemereuil in MEE). This type of prior can also be obtained using  
>> parameter expansion:
>>
>> (V=1, nu=1000, alpha.mu=0, alpha.V=1)
>>
>> and I would guess that the multivariate analogue would be  
>> (V=diag(2), nu=1001, alpha.mu=c(0,0), alpha.V=diag(2)). However, a  
>> chi-square prior probably has poor properties for the Poisson  
>> variance and in MCMCglmm you could only mix these different priors  
>> for model.2 using a different syntax:
>>
>> model1.2<-MCMCglmm(Toes~trait,random=~idh(at(trait,1)):animal+idh(at(trait,2)):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)  
>> G=list(
>>    G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000),
>>    G2=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)
>>  )
>>
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>> Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 04 Jul 2012 16:24:36 +0100:
>>
>>> Dear list,
>>>
>>> I work on bats, some of which have deformed toes. I have just started
>>> trying to use MCMCglmm to investigate whether there is hertiable
>>> variation for the number of deformed toes a bat has. Since most bats
>>> have 0 deformed toes I am using a zero-inflated Poisson model.
>>>
>>> I have put together some code (see below) based on the MCMCglmm tutorial
>>> from the Wilson et al. 2011 'Ecologist's guide to the animal model'
>>> paper, the MCMCglmm Course Notes and trial and error. This code runs and
>>> I get a model and a heritability estimate. However, the heritability
>>> estimate I get is extremely high and this, teamed with the observations
>>> that my posterior distributions of the variance componants are not
>>> pretty as the example ones I've seen and my autocorrelation values are
>>> high (~0.4), makes me suspicious that I am not modelling this trait very
>>> well. I suspect I have over simplified the model. I based my prior on a
>>> model done in ASReml modelling 'Toes' as a Poisson distribution, which
>>> can an heritability estimate of 0.35.
>>>
>>> In addition, when I try and add a second random variable - year of birth
>>> (Born) - into a second model, both plots and auto correlation get worse
>>> and my heritability estimate for number of deformed toes (Toes) goes
>>> from about 0.8 to 0.05!
>>>
>>> Finally, I suspect you're not, but I would like to know if you are
>>> allowed to compare DIC values from models with different priors please.
>>>
>>> I have pasted my code below. If anyone can offer constructive criticism
>>> it would be much appreciated.
>>>
>>> Many thanks,
>>> Helen
>>>
>>> Model 1: With number of deformed toes (Toes) as the only random variable
>>>
>>>
>>> p.var<-var(data$Toes,na.rm=TRUE)
>>> prior1.1<-list(G=list(G1=list(V=matrix(p.var*0.35),n=1)),
>>> R=list(V=matrix(p.var*0.65),n=1))
>>>
>>> model1.1<-MCMCglmm(Toes~1,random=~animal,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) posterior.heritability1.1<-model1.1$VCV[,"animal"]/(model1.1$VCV[,"animal"]+model1.1$VCV[,"trait:units"])  
>>> HPDinterval(posterior.heritability1.1,0.95)
>>> posterior.mode(posterior.heritability1.1)
>>>
>>>
>>> Model 2: With number of toes (Toes) and year of birth (Born) as random
>>> variables
>>>
>>>
>>> p.var<-var(data$Toes,na.rm=TRUE)
>>> prior1.2<-list(G=list(G1=list(V=matrix(p.var/3),n=1),G2=list(V=matrix(p.var/3),n=1)),R=list(V=matrix(p.var/3),n=1))  
>>> model1.2<-
>>> MCMCglmm(Toes~1,random=~animal+Born,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.2,nitt=1000000,thin=1000,burnin=500000,verbose=FALSE) posterior.heritability1.2<-model1.2$VCV[,"animal"]/(model1.2$VCV[,"animal"]+model1.2$VCV[,"Born"]+model1.2$VCV[,"trait:units"])  
>>> HPDinterval(posterior.heritability1.2,0.95)
>>> posterior.mode(posterior.heritability1.2)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>
>>
>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list