[R-sig-ME] ZIP MCMCglmm

Thu Oct 11 14:20:50 CEST 2012

Hi Jarrod, 
Thank you very much for your detailed help! I made some silly mistakes and had to think about some of your answers for a while. Unfortunately, these raise more questions. 
This is my most recent "working" r-code:

prior<-list(R=list(V=diag(2),nu=0.002,fix=2), G=list(G1=list(V=diag(2),nu=0.002)))

Mz<-MCMCglmm(sum_ppho ~ trait-1 + at.level(trait,1):year*
      at.level(trait,1):we+
      at.level(trait,1):(I(cdayofyear^2)+at.level(trait,1):dayofyear)+
      trait:sqkm,
random=~us(trait):EFF_Track, family="zipoisson",pr=TRUE,pl=TRUE,
      data=encounter, rcov=~idh(trait):units, prior=prior, verbose=FALSE,
      nitt=10000,thin =100, burnin = 1000)
summary(Mz)

This gives more or less the expected results, however, I'm not content with its implications.
First, I'm not sure if I misunderstand something, as you can see, I'm still fairly inexperienced. But probably my explanation of "sqkm" was not clear in my last email. 
It is a correction factor, which specifies for each transect "Eff_Track" it's effective observation area. For example in suboptimal weather conditions, you are only able to detect porpoises in shorter distances than in the next year in perfect weather conditions. So you decrease the observed area of transects in the correction factor. So I thought putting it in the random slope term, would account for that. Putting it as a fixed factor is probably of minor interest, as everyone would assume, that you see more porpoises in bigger survey areas. So basically it was my purpose to have the overall slope=0. This still leaves the problem that I need to allow for different slopes and intercepts for the zero-inflation and the poisson part. I tried fitting several terms but could not get the dimensions of the prior right. 

Also, my explanation of g(0) was a very simplified version of the real assumptions, I'm sorry for that. Yes, g(0) both corrects for unavailability bias (that could not be seen, because porpoises were diving) and the perception bias of those missed because of "sleeping" surveyors. Typically g(0) = <0.5, so more than half of the porpoises are missed by surveyors. Which means, quite some zeros will be due to false zeros. 
Does the ZIP model include any assumptions on the probability that animals are missed on transects with detections? I understood, it only accounts for the probability that a zero is a zero from the poisson process or a false zero. 
In fact we have an estimate of g(0) derived from distance sampling methods. I was wondering if it would be possible to use the known/estimated g(0) as starting value for a zero-inflation prior? So far I wasn't able to find any information on how to integrate starting values for latent variables. It seems like a fancy way of dealing with that problem, but maybe that will also exceed my skills. Or is there another way of dealing with that problem?
I believe correcting our counts directly will give even worse results. We give up the ínteger properties of our variable and herewith change the distribution of data. Even worse with our original methods, we only corrected the counts, but not the zeros (by dividing through correction factors)
cheers,
Katrin

-------- Original-Nachricht --------
> Datum: Sun, 07 Oct 2012 00:07:49 +0100
> Von: Jarrod Hadfield <j.hadfield at ed.ac.uk>
> An: Katrin Ronnenberg <ronnenberg at gmx.com>
> CC: r-sig-mixed-models at r-project.org
> Betreff: Re: [R-sig-ME] ZIP MCMCglmm

> Hi,
> 
> A) The residual variance for the zero-inflation is not identified nor  
> is the residual covariation between the zero-inflation and the Poisson  
> part. However, you are attempting to estimate both. Use:
> 
> rcov=idh(trait):units
> 
> and fix the [2,2] element of the covariance matrix at something (e.g  
> 1) in the prior (i.e. add fix=2 to your R prior).
> 
> B) you have no unique intercepts for the zero inflation and the  
> Poisson part. You should have trait as a main effect in the fixed  
> formula (or trait-1 might be easier to interpret).
> 
> C)
> 
>   random= ~us(1+sqkm):EFF_Track
> 
> assumes that the regression parameters (intercept and slope) of sqkm  
> on log Poisson rate and logit zero-inflation are identical within  
> EFF_Tracks, and that the slope=0 on average (because its not in the  
> fixed formula). I would have one of the following in the fixed effects:
> 
> a) trait:sqkm
> b) at.level(trait,1):sqkm
> c) at.level(trait,2):sqkm
> 
> and one of the following in the random effects.
> 
> i) random= ~us(trait):EFF_Track
> ii) random= ~idh(trait):EFF_Track
> iii) random= ~us(at.level(trait,1)):EFF_Track
> iv) random= ~us(at.level(trait,2)):EFF_Track
> 
> and make sure that if i) or ii) to have a),  if iii) to have a) or b)  
> and if iv) to have a) or c).
> 
> If you use ZAP rather than ZIP then fixed=~sqkm and random= ~EFF_Track  
> is OK too.
> 
> D) DIC should not be trusted.
> 
> E) Not sure, but it sounds like g(0) is correcting for missed  
> sightings of individuals present (?) and therefore refers to the  
> Poisson part of the model.  The zero-inflation part may also be due to  
> missed sightings of individuals present (e.g. on some transects the  
> observers had their eyes closed!) but it is possibly more likely to be  
> due to absence of porpoises? I don't know.
> 
> Cheers,
> 
> Jarrod
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Quoting Katrin Ronnenberg <ronnenberg at gmx.com> on Fri, 05 Oct 2012  
> 12:31:12 +0200:
> 
> > Dear all,
> > I have several questions on a zero-inflated MCMCglmm model.
> >
> > Data consist of 764 transects with porpoise counts ("sum_ppho"). On  
> > 304 transects zero porpoises were seen. Data come from several  
> > studies from the North Sea, they were all sampled in roughly the  
> > same area and from altogether 12 years. We are interested in a trend  
> > ("year"), seasonal effects ("dayofyear" quadratic (centered) Term +  
> > the simple term. Moreover we have a simple geographical variable  
> > "we"; a factor with two levels "West" and "east". The transect ID  
> > (EFF_Track) I integrated as random factor, assuming that transects  
> > are not independant of each other I used the us structure. Most  
> > transects were revisited over several years, but not all of them.  
> > Since Transects had differing length and visibility conditions,  
> > which affected the effective area sampled, I used the effective area  
> > ("sqkm") as random slope term.
> >
> >
> >
> > in MCMCglmm I tried the following:
> >
> > prior<-list(R=list(V=diag(2),nu=0.002),
> G=list(G1=list(V=diag(2),nu=0.002)))
> >
> > Mz<-MCMCglmm(sum_ppho~at.level(trait,1):year*at.level(trait,1):we+
> > +           at.level(trait,1):(I(cdayofyear^2)+  
> > at.level(trait,1):dayofyear), random= ~us(1+sqkm):EFF_Track,
> > +              family="zipoisson",
> > +              data=encounter, rcov=~us(trait):units, prior=prior,  
> > verbose=FALSE,
> > +              nitt=100000,thin =1000, burnin = 5000)
> >> summary(Mz)
> >
> > Iterations = 5001:99001
> >  Thinning interval  = 1000
> >  Sample size  = 95
> >
> >  DIC: 2843.102
> >
> >  G-structure:  ~us(1 + sqkm):EFF_Track
> >
> >                                   post.mean l-95% CI u-95% CI eff.samp
> > (Intercept):(Intercept).EFF_Track   3.62368  1.21791  6.03721    6.215
> > sqkm:(Intercept).EFF_Track         -0.38983 -0.61860 -0.15801   14.857
> > (Intercept):sqkm.EFF_Track         -0.38983 -0.61860 -0.15801   14.857
> > sqkm:sqkm.EFF_Track                 0.05025  0.02699  0.07369   42.211
> >
> >  R-structure:  ~us(trait):units
> >
> >                               post.mean  l-95% CI u-95% CI eff.samp
> > sum_ppho:sum_ppho.units          0.8150  0.673652   1.0178   69.615
> > zi_sum_ppho:sum_ppho.units       0.1894 -0.472509   0.6914    7.978
> > sum_ppho:zi_sum_ppho.units       0.1894 -0.472509   0.6914    7.978
> > zi_sum_ppho:zi_sum_ppho.units    0.1561  0.001532   0.6305    8.428
> >
> >  Location effects: sum_ppho ~ at.level(trait, 1):year *  
> > at.level(trait, 1):we + at.level(trait, 1):(I(cdayofyear^2) +  
> > at.level(trait, 1):dayofyear)
> >
> >                                     post.mean   l-95% CI   u-95% CI  
> > eff.samp  pMCMC
> > (Intercept)                        -3.315e+00 -4.486e+00 -2.045e+00   
> >   1.993  <0.01 *
> > at.level(trait, 1):year             1.673e-03  1.111e-03  2.169e-03   
> >   3.976  <0.01 *
> > at.level(trait, 1):wew             -5.459e+02 -6.696e+02 -4.612e+02   
> >  40.273  <0.01 *
> > at.level(trait, 1):I(cdayofyear^2) -3.117e-05 -5.063e-05 -7.107e-06   
> >  95.000  <0.01 *
> > at.level(trait, 1):dayofyear       -1.726e-03 -3.851e-03  2.335e-05   
> >  59.665 0.0632 .
> > at.level(trait, 1):year:wew         2.727e-01  2.305e-01  3.343e-01   
> >  40.388  <0.01 *
> > ---
> >
> >
> >
> >
> > In general I feel insecure if my assumptions are right and whether  
> > prior specifications make sense. But I have two main questions:
> > 1. Question: Am I correct in the assumption, that, if the output for  
> > autocorr(M1$Sol) and autocorr(M1$VCV) are low, that spatial  
> > autocorrelation is no issue? If I specify a relatively simple model  
> > (only main effects), autocorrelation is no issue, however the more  
> > interaction terms I include the stronger the autocorrelation  
> > becomes. Increasing itterations by a factor of ten to:  
> > nitt=100000,thin =5000, burnin = 10000, hardly changed anything  
> > about that. Is there anything else I could try? I know that I can  
> > increase nitt... further and ask Malcolm about the sir() function.  
> > But I'm especially interested, if experimenting with the random term  
> > has a chance of improving things?
> > 2. Question: Following methods of distance sampling, a G(0) factor  
> > was estimated compensating for the diving porpoises, which escaped  
> > the surveyors. Originally we corrected the observations by dividing  
> > the counts by the G(0) factor. So, basically only the transects with  
> > counts were corrected. Am I right in the assumption that the latent  
> > variable in a ZIP model compensates for all values with false zeros  
> > i.e. also the false zero for the third animal in a transect with 2  
> > counted animals? If that's the case, I would just not include any  
> > correction for g(0). But if the latent variable only compensates for  
> > false zeros of all zero values, but not for other values from the  
> > poisson distribution, then I would assume that the correction factor  
> > is needed. Is that the difference between a hurdle and a ZIP model?  
> > ZIP was in my case better than hurdle according to DIC.
> >
> > I'm sorry, I hope you can understand my questions. I'm neither a  
> > statistician, nor a native speaker, so I would like to appologize  
> > for bad choice of words. And please let me know, if a representive  
> > sample of my data would help.
> > Thank you for reading this, ideas and suggestions are very much
> appreciated.
> >
> > Katrin
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> >
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
>