[R-sig-eco] Trouble running full ZINB or ZANB model: NaNs produced and errors in optim()

Fri May 4 00:55:18 CEST 2012

Hi Jason,

On 3 May 2012, at 5:56 PM, J Straka wrote:

> Hello,
> 
> I am attempting to run zero-inflated and hurdle models using the
> pscl() package, but keep encountering problems with running my
> fully-parameterized model and I am not sure why.
> 
> My data consist of measurements of seed set (the response, which is
> zero-inflated and overdispersed) for two treatments and a control
> (B,F, and O) at three elevations (low, mid, and high).  At each
> elevation I also measured average temperature (AvgTemp) and abundance
> of insects (N).  My goal is to construct a model that best describes
> seed set using the four predictors above (and interactions, if
> necessary).  I find that I have no problems running a partial model
> with only my two categorical predictors as follows (just an example
> using hurdle(),but results are the same for zeroinfl()):
> 
> H1B <- hurdle(AVInt ~ Elev + Treatment  |  Elev + Treatment, dist
> ="negbin", link = "logit", data = erytseeds)
> 
> It’s when I try to add the two continuous predictors that things begin
> to get messy;
> 
> H4B <- hurdle(AVInt ~ Elev*Treatment*AvgTemp*N | Elev*Treatment*AvgTemp*N)
> 
> This results in the error:
> Warning message:
> In sqrt(diag(object$vcov)) : NaNs produced
> Error in optim(fn = loglikfun, gr = gradfun, par = c(start$count, start$zero,  :
> non-finite value supplied by optim
> 
> Even when I run the model using one continuous predictor and one
> categorical predictor, or both continuous predictors, I end up with
> NAs in the summary table of the count, but not in the zero-inflated
> portion of the model.  For example:
> 
>> summary(H9B)
> 
> Call:
> zeroinfl(AVInt ~ AvgTemp+N | AvgTemp+N, data = erytseeds, dist
> ="negbin", link = "logit")
> 
> Pearson residuals:
> Min      1Q  Median      3Q     Max
> -0.5807 -0.4566 -0.2791 -0.2791  5.5813
> 
> Count model coefficients (negbin with log link):
> 
>                  Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -3.770120     0.452221  -8.337   <2e-16 ***
> AvgTemp      0.404550      0.173458   2.332   0.0197 *
> N            0.001211            NA      NA       NA
> Log(theta)   1.308387            NA      NA       NA
> 
> Zero-inflation model coefficients (binomial with logit link):
> 
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.814e+01  8.115e+00  -5.932 2.99e-09 ***
> AvgTemp      4.039e+00  7.158e-01   5.642 1.68e-08 ***
> N            3.010e-03  4.863e-04   6.189 6.04e-10 ***
> 
> ---
> Theta = 3.7002
> Number of iterations in BFGS optimization: 31
> Log-likelihood: -456.4 on 7 Df
> 
>> summary(H10B)
> 
> Call:
> zeroinfl(AVInt ~ Treatment+N | Treatment+N, data = erytseeds, dist
> ="negbin", link = "logit")
> 
> Pearson residuals:
>    Min      1Q  Median      3Q     Max
> -0.6608 -0.4684 -0.3565 -0.2956  6.1730
> 
> Count model coefficients (negbin with log link):
> 
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept) 0.9555134         NA      NA       NA
> TreatmentF  0.1407240  0.1844811   0.763    0.446
> TreatmentO  0.0430166  0.1647207   0.261    0.794
> N           0.0009843         NA      NA       NA
> Log(theta)  1.3049021         NA      NA       NA
> 
> Zero-inflation model coefficients (binomial with logit link):
> 
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept)  0.0685048  1.0056156   0.068  0.94569
> TreatmentF  -0.3822091  0.3631982  -1.052  0.29264
> TreatmentO  -1.1624582  0.3551596  -3.273  0.00106 **
> N            0.0009120  0.0004773   1.911  0.05603 .
> ---
> Theta = 3.6873
> Number of iterations in BFGS optimization: 31
> Log-likelihood: -458.1 on 9 Df
> 
> Reading other posts, I know that my values are all >= 0, and there are
> no missing values that could be causing this, but there ARE lots of
> zeroes.  My sample size is decent (about 35 observations per
> treatment, per elevation).  

Your effective sample size (what determines how many parameters you can fit) is more like the number of non-zero observations you have. Then you can estimate about one tenth that many parameters. So for the model you're trying to fit, you would need 50 to 80 non-zero observations.

I would first try fitting a zero-inflated Poisson because they're less finicky than negative binomial. It's the negative binomial part of your models that seem to be broken.

As Ben said on the mixed models list, it could help to center and scale the continuous predictors using scale().
cheers,
Mollie

> So my questions are essentially as
> follows:
> 
> 1) Why am I getting all these NaNs, and why can’t I run the full
> (saturated) model, including continuous as well as categorical
> predictors and interactions?  How can I begin the process of model
> simplification if I can’t get the full model to run?
> 
> 2) If this is a result of the many zeroes in my data set, is there a
> way to work around this?
> 
> Any advice would be greatly appreciated.
> 
> Jason Straka
> University of Victoria
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>