[R-sig-ME] A zero inflated Poisson model in MCMCglmm

Jarrod Hadfield j.hadfield at ed.ac.uk
Fri Jul 13 17:50:26 CEST 2012


Hi,

It will not be well behaved  - you need:

R=list(V=1,fix=1)

in the prior.

pl=TRUE goes into your MCMCglmm() call.

Parameter expansion is currently only implemented for the G-structure.

Cheers,

Jarrod



Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Fri, 13 Jul 2012 16:34:36 +0100:

> Hello,
>
> Thank you for another thorough response.
>
> I've decided to stick with the ordinal model, although I can't work  
> out where to put the pl=TRUE to work out the range of latent  
> variables.
>
> I just used Vanimal/(Vanimal+Vresidual) to get heritability, rather  
> than Vanimal/(Vanimal+Vresidual+1). I have now tried the latter  
> though and the answer is very similar.
>
> Unfortunately running the model for longer and giving it a longer  
> burnin has not reduced the autocorrelation. When you say 'parameter  
> expansion', does that mean of G1 in the prior?
>
> I've carried on using this prior:
>
> prior2<-list(G=list(G1=list(V=matrix(p.var/2),n=1)),  
> R=list(V=matrix(p.var/2),n=1))
>
> in which I think the variance is set to 1.
>
> Thank you again: all help is very much appreciated.
>
> Helen
>
> On 12/07/2012 14:13, Jarrod Hadfield wrote:
>> Hi,
>>
>> Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 11 Jul 2012 11:58:50 +0100:
>>
>>> Dear Jarrod and David,
>>>
>>> Thank you both very much for your responses: I'm sorry it's taken  
>>> me so long to say that, but I wanted to have a proper go at  
>>> everything before I replied, which took longer than I anticipated!
>>>
>>> Thank you especially for your zipoisson code, Jarrod. When I  
>>> initially attempted to use your prior and model MCMCglmm asked me  
>>> for a stronger prior, so I changed it to
>>>
>>>
>>> Prior1=list(R=list(V=diag(2),nu=1, fix=2), G=list(G1=G1)) : is  
>>> that a reasonable thing to have done? The model
>>>
>>>
>> I think it is better to diagnose the problem rather than upping nu.  
>> In this case I think the residual variance of the Poisson process  
>> must be going to zero under the flat prior? Perhaps the counts are  
>> under-dispersed with respect to the Poisson, further motivating an  
>> ordinal model?
>>
>>
>>
>> model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=Prior1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>>>
>>> ran with this prior, however I admit I am now flummoxed as to how  
>>> to proceed and go about estimating heritability.
>>>
>>>
>>> For this reason, (and because you both recommended it!), I went on  
>>> to try an ordinal model using the following code:
>>>
>>> p.var<-var(data$Toes,na.rm=TRUE)
>>>
>>> prior2<-list(G=list(G1=list(V=matrix(p.var/2),n=1)),  
>>> R=list(V=matrix(p.var/2),n=1))
>>>
>>>
>> IMPORTANT: The residual variance is not identifiable in an ordinal  
>> model so all the information is coming from the prior. This is OK,  
>> but I would fix the variance at 1 so as to avoid interpretation  
>> difficulties (and numerical problems - check the range of the  
>> latent variables using pl=TRUE).
>>
>>
>> model2<-MCMCglmm(Toes~1,random=~animal,family="ordinal",pedigree=ped,data=data,prior=prior2,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
>>>
>>>
>>> followed by the same model but with the prior from the Wilson  
>>> paper erratum:
>>>
>>> prior2.1 <- list(G = list(G1 = list(V = 1, nu = 0.002)), R =  
>>> list(V = 1,nu = 0.002))
>>>
>>>
>>> the plots for these models look ok, and they both give a  
>>> heritability estimate of ~0.55, with very similar 95% HDP  
>>> intervals, although the auto correlation is quite high ~0.6
>>
>> Is this calculated as Vanimal/(Vanimal+Vresidual+1)  or  
>> Vanimal/(Vanimal+Vresidual)?
>>
>>>
>>>
>>> Next I tried a categorical model, treating Toes as a binary  
>>> response variable (deforemed or not deformed), using
>>>
>>> p.var<-var(data$ToesBinary,na.rm=TRUE)
>>>
>>> prior3= list(R = list(V = 1, n = 0, fix = 1), G = list(G1 = list(V  
>>> = 1,n = 0)))
>>>
>>> model3<-  
>>> MCMCglmm(ToesBinary~1,random=~animal,family="categorical",data=data,pedigree=ped,prior=prior3,nitt=200000,thin=250,burnin=100000,verbose =  
>>> FALSE)
>>>
>>>
>>> Heritability came out higher with this model at about ~0.85.
>>
>> Is this calculated as Vanimal/(Vanimal+Vresidual+pi^2/3)  or  
>> Vanimal/(Vanimal+Vresidual)?
>>>
>>>
>>> And then last but not least a multinomial2 model:
>>>
>>> prior4=list(R=list(V=1,nu=0.002),G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=100))) model4<-MCMCglmm(cbind(Deformed,Fine)~1,random=~animal,family="multinomial2",data=data,pedigree=ped,prior=prior4,nitt=200000,thin=250,burnin=100000,verbose=FALSE) The variance plots for this one didn't look very pretty and the autocorrelation values were still quite high, ~0.8, but it gave me a pretty heritability plot and an estimate similar to my ordinal  
>>> models.
>>>
>>>
>>> As you can probably gather from all of this, I am something of an  
>>> indecisive person...., but at this stage I am inlcined to stick  
>>> with the ordinal models and believe that there is a heritable  
>>> element to the number of deformed toes (which seems rather  
>>> intuitive after all that!). Given this decision, can you recommend  
>>> anything I might do to reduce the autocorrelation in these models.  
>>> Also, do I have to run the same model many times to confirm its  
>>> conclusions?
>>
>> Re: autocorrelation. Use parameter expansion and/or run for longer  
>> (particularly for binary animal models and/or if there is little  
>> overdispersion)  If the absolute value of the latent variables  
>> under the logit link exceed 20, or under the probit link exceed 7,  
>> then at the moment I would consider another program unless the  
>> latent variables are associated with particular levels of a fixed  
>> effect.  Then flatish priors on the probability scale may overcome  
>> the problem.
>>
>>
>>>
>>>
>>> Finally finally, I will have to do another zipoisson model in the  
>>> future, can you direct me to where I might find an example of the  
>>> code used for estimating heritability in these cases please?
>>
>> I think it makes most sense to think about the heritabilities of  
>> each process separately   - the first as a Poisson trait and the  
>> second as a threshold (binary) trait.
>>
>>>
>>>
>>> Many many thanks once more,
>>> Helen
>>>
>>>
>>>
>>>
>>> On 08/07/2012 12:03, Jarrod Hadfield wrote:
>>>> Hi,
>>>>
>>>> I agree with David that an ordinal model may behave better for  
>>>> this type of data (or even a 0/1 response with deformed or not)  
>>>> but in case you want to pursue a ZIP model...
>>>>
>>>> The model you have specified is probably not what you had in  
>>>> mind. You should think of the ZIP response as being two responses  
>>>> (traits): the Poisson counts (trait 1) and the probability that a  
>>>> zero comes from the zero-inflation process (trait 2). At the  
>>>> moment you have a common intercept for both and therefore assume  
>>>> that the probability that a zero is from the zero-inflation  
>>>> process (on the logit scale) is equal to the mean Poisson count  
>>>> (on the log scale). There is probably little justification for  
>>>> this (but see ZAP models). Moreover, you have a single animal  
>>>> term, and therefore assume that both processes have the same  
>>>> genetic variance and that the genetic correlation between them is  
>>>> 1. You also assume that the residual variance for each process is  
>>>> equal (but the residual correlation is 0), although this is not  
>>>> actually that bigger deal because residual variation in the  
>>>> zero-inflation process (and residual corrletaion) is not  
>>>> identifiable from the data and so the value it could take is  
>>>> essentially arbitrary. However, I like to recognise this  
>>>> non-identifiability by fixing the variance at 1 and the  
>>>> covariance at zero (but see ZAP models where your specification  
>>>> is useful).
>>>>
>>>> So, a more appropriate model would be:
>>>>
>>>> model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) or  
>>>> perhaps
>>>>
>>>> model1.2<-MCMCglmm(Toes~trait,random=~idh(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) The difference between these two is that the genetic correlation between the zero-inflation and the Poisson processes is estimated, and in the latter it is set to  
>>>> 0.
>>>>
>>>> I'm not sure about how much data you have, but I would think that  
>>>> if the incidence of deformed toes is very low, you probably have  
>>>> little information for some of the parameters. In this case you  
>>>> have to be VERY careful with priors. The prior recommendations in  
>>>> the MCMCglmm tutorial in Wilson et al. 2011 will generally be  
>>>> quite informative for small or even moderate sized datasets (the  
>>>> authors have kindly published an erratum as many people seem to  
>>>> be using their recommendation without realising its implications).
>>>>
>>>> The prior could look like:
>>>>
>>>> prior=list(R=list(V=diag(2), nu=0, fix=2), G=list(G1=G1))
>>>>
>>>> which fixes the residual variance in the zero-inflation process  
>>>> to 1 with a flat improper prior on the Poisson over-dispersion.  
>>>> For the prior on the genetic (co)variances (G1) I generally use  
>>>> parameter expanded priors of the form:
>>>>
>>>> G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>>>
>>>> or
>>>>
>>>> G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000)
>>>>
>>>> for model.2.
>>>>
>>>> These generally give posterior modes for the variance components  
>>>> that are close to (RE)ML estimates, although it is unclear to me  
>>>> how they behave for the covariances and functions of variances  
>>>> (e.g. heritability). At least for binary traits (e.g. the  
>>>> zero-inflation bit) I have seen a chi-square prior recommended  
>>>> because of its  prior properties on the heritability (de  
>>>> Villemereuil in MEE). This type of prior can also be obtained  
>>>> using parameter expansion:
>>>>
>>>> (V=1, nu=1000, alpha.mu=0, alpha.V=1)
>>>>
>>>> and I would guess that the multivariate analogue would be  
>>>> (V=diag(2), nu=1001, alpha.mu=c(0,0), alpha.V=diag(2)). However,  
>>>> a chi-square prior probably has poor properties for the Poisson  
>>>> variance and in MCMCglmm you could only mix these different  
>>>> priors for model.2 using a different syntax:
>>>>
>>>> model1.2<-MCMCglmm(Toes~trait,random=~idh(at(trait,1)):animal+idh(at(trait,2)):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)  
>>>> G=list(
>>>>   G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000),
>>>>   G2=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)
>>>> )
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Jarrod
>>>>
>>>>
>>>>
>>>>
>>>> Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 04 Jul 2012  
>>>> 16:24:36 +0100:
>>>>
>>>>> Dear list,
>>>>>
>>>>> I work on bats, some of which have deformed toes. I have just started
>>>>> trying to use MCMCglmm to investigate whether there is hertiable
>>>>> variation for the number of deformed toes a bat has. Since most bats
>>>>> have 0 deformed toes I am using a zero-inflated Poisson model.
>>>>>
>>>>> I have put together some code (see below) based on the MCMCglmm tutorial
>>>>> from the Wilson et al. 2011 'Ecologist's guide to the animal model'
>>>>> paper, the MCMCglmm Course Notes and trial and error. This code runs and
>>>>> I get a model and a heritability estimate. However, the heritability
>>>>> estimate I get is extremely high and this, teamed with the observations
>>>>> that my posterior distributions of the variance componants are not
>>>>> pretty as the example ones I've seen and my autocorrelation values are
>>>>> high (~0.4), makes me suspicious that I am not modelling this trait very
>>>>> well. I suspect I have over simplified the model. I based my prior on a
>>>>> model done in ASReml modelling 'Toes' as a Poisson distribution, which
>>>>> can an heritability estimate of 0.35.
>>>>>
>>>>> In addition, when I try and add a second random variable - year of birth
>>>>> (Born) - into a second model, both plots and auto correlation get worse
>>>>> and my heritability estimate for number of deformed toes (Toes) goes
>>>>> from about 0.8 to 0.05!
>>>>>
>>>>> Finally, I suspect you're not, but I would like to know if you are
>>>>> allowed to compare DIC values from models with different priors please.
>>>>>
>>>>> I have pasted my code below. If anyone can offer constructive criticism
>>>>> it would be much appreciated.
>>>>>
>>>>> Many thanks,
>>>>> Helen
>>>>>
>>>>> Model 1: With number of deformed toes (Toes) as the only random variable
>>>>>
>>>>>
>>>>> p.var<-var(data$Toes,na.rm=TRUE)
>>>>> prior1.1<-list(G=list(G1=list(V=matrix(p.var*0.35),n=1)),
>>>>> R=list(V=matrix(p.var*0.65),n=1))
>>>>>
>>>>> model1.1<-MCMCglmm(Toes~1,random=~animal,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) posterior.heritability1.1<-model1.1$VCV[,"animal"]/(model1.1$VCV[,"animal"]+model1.1$VCV[,"trait:units"])  
>>>>> HPDinterval(posterior.heritability1.1,0.95)
>>>>> posterior.mode(posterior.heritability1.1)
>>>>>
>>>>>
>>>>> Model 2: With number of toes (Toes) and year of birth (Born) as random
>>>>> variables
>>>>>
>>>>>
>>>>> p.var<-var(data$Toes,na.rm=TRUE)
>>>>> prior1.2<-list(G=list(G1=list(V=matrix(p.var/3),n=1),G2=list(V=matrix(p.var/3),n=1)),R=list(V=matrix(p.var/3),n=1))  
>>>>> model1.2<-
>>>>> MCMCglmm(Toes~1,random=~animal+Born,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.2,nitt=1000000,thin=1000,burnin=500000,verbose=FALSE) posterior.heritability1.2<-model1.2$VCV[,"animal"]/(model1.2$VCV[,"animal"]+model1.2$VCV[,"Born"]+model1.2$VCV[,"trait:units"])  
>>>>> HPDinterval(posterior.heritability1.2,0.95)
>>>>> posterior.mode(posterior.heritability1.2)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list