[R-sig-eco] Question: Hurdle model cross validation
Justine Jackson-Ricketts
jdjackso at ucsc.edu
Thu Jun 18 18:46:49 CEST 2015
Hello,
I am trying to run 10-fold cross validation on hurdle models that I ran
using the package "pscl". I am using the cvFit function in the package
"cvTools". I have run a hurdle model with a negative binomial distribution
and one with a Poisson distribution. At the request of one of my advisors,
I also ran a zero-inflated negative binomial model, also in "pscl".
My code for the hurdle models is
H1A<-hurdle(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="negbin", link="logit")
H1B<-hurdle(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="poisson", link="logit")
And for the ZINB:
Z1<-zeroinfl(sightings~depth+temp+turbidity+chla+salinity+ph+dr+dc+calves,
data=hdata, dist="negbin", link="logit")
I have tried two different sets of code for the cross-validation:
(1) cvFit(H1A, data=hdata, x=NULL, y=hdata$sightings, cost=rmspe, K=10,
R=10)
This returns "NA"
(2) x <- c(hdata$depth, hdata$temp, hdata$turbidity, hdata$chla,
hdata$salinity, hdata$ph, hdata$dr, hdata$dc, hdata$calves)
cvFit(H1A, data=hdata, x=x, y=hdata$sightings, cost=rmspe, K=10, R=10)
This returns "NA" and this error message: *In sqrt(diag(vc_count)[kx +
1]) : NaNs produced*
When I then tried to run cross validation on the ZINB (I found a couple
answered questions on Stack Exchange and Cross Validated that seemed to
suggest that the NaNs error was common to hurdle models), I got a
completely different error message, not accompanied by "NA": *Error in
solve.default(as.matrix(fit$hessian)) : *
* Lapack routine dgesv: system is exactly singular: U[20,20] = 0*
I looked this up and found that it could be due to collinearity, so I
calculated the VIFs for my explanatory variables. One, "dc", was over 3, so
I removed it and calculated the VIFs again. All were below 2, so I reran
the models without "dc". Now, all that the cross validation for hurdle
models gives is "NA" and the cross validation for the ZINB gives a similar
error as before plus a NaNs error: *Error in
solve.default(as.matrix(fit$hessian)) : *
* system is computationally singular: reciprocal condition number =
8.23449e-18*
*In addition: Warning message:*
*In sqrt(diag(vc)[np]) : NaNs produced *
I'm not sure if the problem is my code, my data (attached), or the package,
but any insight would be extremely appreciated!
Cheers,
--
Justine Jackson-Ricketts
Ph.D. Candidate - Costa Lab
UCSC Long Marine Laboratory
100 Shaffer Road
Santa Cruz, CA 95060
[[alternative HTML version deleted]]
More information about the R-sig-ecology
mailing list