[R-sig-eco] Question: Hurdle model cross validation

Justine Jackson-Ricketts jdjackso at ucsc.edu
Thu Jun 18 18:46:49 CEST 2015


Hello,

I am trying to run 10-fold cross validation on hurdle models that I ran
using the package "pscl". I am using the cvFit function in the package
"cvTools". I have run a hurdle model with a negative binomial distribution
and one with a Poisson distribution. At the request of one of my advisors,
I also ran a zero-inflated negative binomial model, also in "pscl".

My code for the hurdle models is
H1A<-hurdle(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="negbin", link="logit")

H1B<-hurdle(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="poisson", link="logit")

And for the ZINB:
Z1<-zeroinfl(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="negbin", link="logit")

I have tried two different sets of code for the cross-validation:

(1) cvFit(H1A, data=hdata, x=NULL, y=hdata$sightings, cost=rmspe, K=10,
R=10)

     This returns "NA"

(2) x <- c(hdata$depth, hdata$temp, hdata$turbidity, hdata$chla,
hdata$salinity, hdata$ph, hdata$dr, hdata$dc, hdata$calves)

cvFit(H1A, data=hdata, x=x, y=hdata$sightings, cost=rmspe, K=10, R=10)

     This returns "NA" and this error message: *In sqrt(diag(vc_count)[kx +
1]) :            NaNs produced*


When I then tried to run cross validation on the ZINB (I found a couple
answered questions on Stack Exchange and Cross Validated that seemed to
suggest that the NaNs error was common to hurdle models), I got a
completely different error message, not accompanied by "NA": *Error in
solve.default(as.matrix(fit$hessian)) : *

*  Lapack routine dgesv: system is exactly singular: U[20,20] = 0*


I looked this up and found that it could be due to collinearity, so I
calculated the VIFs for my explanatory variables. One, "dc", was over 3, so
I removed it and calculated the VIFs again. All were below 2, so I reran
the models without "dc". Now, all that the cross validation for hurdle
models gives is "NA" and the cross validation for the ZINB gives a similar
error as before plus a NaNs error: *Error in
solve.default(as.matrix(fit$hessian)) : *

*  system is computationally singular: reciprocal condition number =
8.23449e-18*

*In addition: Warning message:*

*In sqrt(diag(vc)[np]) : NaNs produced *


I'm not sure if the problem is my code, my data (attached), or the package,
but any insight would be extremely appreciated!


Cheers,
-- 
Justine Jackson-Ricketts
Ph.D. Candidate - Costa Lab
UCSC Long Marine Laboratory
100 Shaffer Road
Santa Cruz, CA 95060

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list