[R-sig-eco] modelling zero inflated continuous data using a two stage model in glmmpql - question about implementing the delta method

Scott Foster scott.foster at csiro.au
Tue Feb 3 22:52:54 CET 2015


Hi Phil,

This is just a suggestion, and details will have to be thought through carefully.  Well actually it is two suggestions (I would use the second).  
Please note that I have not delved deeply into your code and therefore I assume that you are doing something sensible at the modelling stage.

Suggestion number 1:  Use the delta method (don't get confused with delta model).  Basically, you take a linear approximation to the non-linear model 
and follow the parameter estimates' uncertainty through.  The calculus shouldn't be too onerous for these two part models.

Suggestion number 2:  Parametric bootstrap.  You know the (asymptotic) distribution of the estimates, through MLE theory.  If you are willing to 
assume that you are close enough to asymptotic properties, then you can draw a large number of the of possible/likely parameter values.  For each set 
of parameter values, do the prediction.  You will end up with a set of predictions that reflect the uncertainty in the parameters.

I noticed that you are also using a glmm in at least one part of your model.  Another choice needs to be made then: do you predict conditionally or 
marginally to the population of random effects.  If conditional, you specify the set of random effects to some value (usually zero or the observed 
levels).  If marginal you have to integrate out the stochasticity of the random effect's population. The approach taken will have an effect on both 
the mean response and the uncertainty around that mean.  However, in linear models, it will only affect the uncertainty (but your model isn't linear).

I hope that this helps,

Scott

On 04/02/15 08:21, Philip Harrison wrote:
> Update,
>
> So it seems predictions for the delta method ARE as simple as the probabily*estimates, which for the example can pretty easily computed using the 
> predict function (ive added my coding below)
>
> However does anyone know how to compute confidence intervals with this delta method? Im guessing its not as straightforward as the product of the 
> confidence intervals for the two models which are easily computed?
>
> Any help would be greatly appreciated?
>
> Cheers
>
> Phil
>
>
>
>
>
> Code
>
> #### generate a grid for prediction which is object "predition grid"
>
>
> library(AICmodavg)
>
> #generate probability predictions form the binomial model
>
> predoBin<-predictSE.lme(BinaryMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)
>
> predsBinG<-cbind(predictiongrid,predoBin)
>
> predsBinG$Probality<-plogis(predsBinG$fit)
> predsBinG$CIH.bin<-plogis(predsBinG$fit+(1.96*predsBinG$fit$se.fit))
> predsBinG$CIL.bin<-plogis(predsBinG$fit-(1.96*predsBinG$fit$se.fit))
>
>
> ###give the binomial model predictions an identifier
>
>
> ####generate (and backtransform from the cuberoot) predictions from the  positive model
>
> predoPos<-predictSE.lme(PositveMod,predictiongrid,level=0,print.matrix=TRUE,se.fit=TRUE)
> predsPosG<-cbind(PositiveModel,predoPos)
> predsPosG$PosWij<-(predsPosG$fit)^3
> predsBinG$CIH<-(predsPosG$PosWij+(1.96*predsPosG$se.fit))^3
> predsBinG$CIL<-(predsPosG$PosWij-(1.96*predsPosG$se.fit))^3
>
>
> DeltaMaker<-cbind(predBinG,predsPosG)
>
> #########get the conditional estimates
>
> DeltaMaker$ConditionalWij<-DeltaMaker$Probability*DeltaMaker$PosWij
>
>
> ####theorectical CI estimation (this is likely wrong???need to bootstrap or something?)
>
>
> DeltaMaker$ConditionalWijCIH<-DeltaMaker$CIH.bin*DeltaMaker$CIH
>
> DeltaMaker$ConditionalWijCIL<-DeltaMaker$CIL.bin*DeltaMaker$CIL
>
> ####im thinking maybe only using the CIs from the positive model might make sense?
>
> DeltaMaker$ConditionalWijCIH<-DeltaMaker$Probability*DeltaMaker$CIH
>
> DeltaMaker$ConditionalWijCIL<-DeltaMaker$Probability*DeltaMaker$CIL
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Quoting Philip Harrison <pharriso at uwaterloo.ca>:
>
>> Dear R-sig-ecology group,
>>
>> I have some telemetry data with a zero-inflated (semi-continuous) response, which I modelled using a two stage process using glmmpql/nmle. I chose 
>> nlme/glmmpql because a temporal correlation structure is very important for my telemetry data and I can live with pql estimation and no AICs if it 
>> means I can avoid winBUGS!.
>>
>> I am pretty happy with my models and I don't want to use ZIPglmms,ZINBglmms or ZAPglmms as the response is a pretty normal continuous variable and 
>> not a count.
>>
>> So now, if possible I would like to combine the results of the two models to make predictions about resource selection (with Confidence intervals) 
>> for each of the levels of my fixed effects, using the 'delta' approach, mentioned a few times on r-sig-eco including here
>>
>> https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002159.html
>>
>> I found quite a few papers about the delta approach (typically using abundance estimates as the response), but I am having trouble translating the 
>> method into code (I've added refs at the end of the message). I have seen papers where they appear to just multiply the predictions from the 
>> positive model by the probability from the first model (although the papers in question use Bayesian techniques multiplying the posterior modes) 
>> Can I do the same with the predictions from my frequentist approach, somehow that seems over simplistic?
>>
>> Also the response for the positive part of my model normalizes nicely with a cube root transformation and many of the papers use log-normal 
>> positive responses- I'm assuming this doesn't make any difference as long as I leave the back transformation until the very end?
>>
>> Any help would be much appreciated. See below for model details
>>
>> ###############################################################################
>> Models- First I model the probability of a non zero value, (i.e. presence/absence) then model the continuous part i.e. the strength of selection in 
>> the event of a positive selection event.  Sorry I don't have a real example dataset; however in this case I just need to find out how to extract 
>> the relevant stuff from lme/glmmpql output.
>>
>> Wij - this is my resource selection response variable  (Wij= proportion of time spent in habitat/proportion of habitat available). This response 
>> varies from 0 (i.e. habitat was available but not used) continuously through 1 (neither selected nor avoided) and continuously >1 (positive 
>> selection) up to a theoretical infinity.
>>
>> Seas; a categorical 4 level variable
>> Sequence; a variable created based on detection date used for the correlation structure.
>> Habitat: a categorical habitat variable
>>
>> Xdata$WijBin<-ifelse(Xdata$Wij>0,1,0)
>> XdataUsed<- Xdata[Xdata$WijBIN0==1,]
>>
>> BinaryMod<-glmmpql(BinaryVariable~Seas+Habitat+Seas*Habitat,random=~1|FishID,cor=corCAR1(form=~Sequence),data=Xdata,family="binomial")
>>
>> PositiveModel <-lme((Wij)^(1/3)~ Seas+Habitat+Seas*Habitat,random=~1|FishID/TemperatureCategory,cor=corCAR1(form=~Sequence),data=XdataUsed)
>>
>>
>>
>> #########################
>> Refs
>>
>> Pennington, Michael. "Efficient estimators of abundance, for fish and plankton surveys." Biometrics (1983): 281-286.
>>
>> Stefánsson, Gunnar. "Analysis of groundfish survey abundance data: combining the GLM and delta approaches." ICES Journal of Marine Science: Journal 
>> du Conseil 53.3 (1996): 577-588.
>>
>> Anlauf-Dunn, Kara J., et al. "Habitat connectivity, complexity, and quality: predicting adult coho salmon occupancy and abundance." Canadian 
>> Journal of Fisheries and Aquatic Sciences 71.12 (2014): 1864-1876.
>>
>>
>>
>>
>> Phil Harrison
>> PhD student (Fisheries Ecology)
>> Department of Biology
>> University of Waterloo
>> 200 University Avenue West
>> Waterloo, Ontario, Canada
>> N2L 3G1
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
>
>
> Philip Harrison MSc
> PhD student (Fisheries Ecology)
> Department of Biology
> University of Waterloo
> 200 University Avenue West
> Waterloo, Ontario, Canada
> N2L 3G1
> Cell:226-808-2309
> Email:pharriso at uwaterloo.ca
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

-- 
Scott Foster
CSIRO
E scott.foster at csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au



More information about the R-sig-ecology mailing list