[R-sig-eco] modelling zero inflated continuous data using a two stage model in glmmpql - question about implementing the delta method

Fri Jan 30 20:37:35 CET 2015

Dear R-sig-ecology group,

I have some telemetry data with a zero-inflated (semi-continuous)  
response, which I modelled using a two stage process using  
glmmpql/nmle. I chose nlme/glmmpql because a temporal correlation  
structure is very important for my telemetry data and I can live with  
pql estimation and no AICs if it means I can avoid winBUGS!.

I am pretty happy with my models and I don’t want to use  
ZIPglmms,ZINBglmms or ZAPglmms as the response is a pretty normal  
continuous variable and not a count.

So now, if possible I would like to combine the results of the two  
models to make predictions about resource selection (with Confidence  
intervals) for each of the levels of my fixed effects, using the  
‘delta’ approach, mentioned a few times on r-sig-eco including here

https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002159.html

I found quite a few papers about the delta approach (typically using  
abundance estimates as the response), but I am having trouble  
translating the method into code (I’ve added refs at the end of the  
message). I have seen papers where they appear to just multiply the  
predictions from the positive model by the probability from the first  
model (although the papers in question use Bayesian techniques  
multiplying the posterior modes) Can I do the same with the  
predictions from my frequentist approach, somehow that seems over  
simplistic?

Also the response for the positive part of my model normalizes nicely  
with a cube root transformation and many of the papers use log-normal  
positive responses- I’m assuming this doesn’t make any difference as  
long as I leave the back transformation until the very end?

Any help would be much appreciated. See below for model details

###############################################################################
Models- First I model the probability of a non zero value, (i.e.  
presence/absence) then model the continuous part i.e. the strength of  
selection in the event of a positive selection event.  Sorry I don’t  
have a real example dataset; however in this case I just need to find  
out how to extract the relevant stuff from lme/glmmpql output.

Wij – this is my resource selection response variable  (Wij=  
proportion of time spent in habitat/proportion of habitat available).  
This response varies from 0 (i.e. habitat was available but not used)  
continuously through 1 (neither selected nor avoided) and continuously  
 >1 (positive selection) up to a theoretical infinity.

Seas; a categorical 4 level variable
Sequence; a variable created based on detection date used for the  
correlation structure.
Habitat: a categorical habitat variable

Xdata$WijBin<-ifelse(Xdata$Wij>0,1,0)
XdataUsed<- Xdata[Xdata$WijBIN0==1,]

BinaryMod<-glmmpql(BinaryVariable~Seas+Habitat+Seas*Habitat,random=~1|FishID,cor=corCAR1(form=~Sequence),data=Xdata,family=”binomial”)

PositiveModel <-lme((Wij)^(1/3)~  
Seas+Habitat+Seas*Habitat,random=~1|FishID/TemperatureCategory,cor=corCAR1(form=~Sequence),data=XdataUsed)

#########################
Refs

Pennington, Michael. "Efficient estimators of abundance, for fish and  
plankton surveys." Biometrics (1983): 281-286.

Stefánsson, Gunnar. "Analysis of groundfish survey abundance data:  
combining the GLM and delta approaches." ICES Journal of Marine  
Science: Journal du Conseil 53.3 (1996): 577-588.

Anlauf-Dunn, Kara J., et al. "Habitat connectivity, complexity, and  
quality: predicting adult coho salmon occupancy and abundance."  
Canadian Journal of Fisheries and Aquatic Sciences 71.12 (2014):  
1864-1876.

Phil Harrison
PhD student (Fisheries Ecology)
Department of Biology
University of Waterloo
200 University Avenue West
Waterloo, Ontario, Canada
N2L 3G1