[R-sig-ME] glmmADMB Warning: Estimated covariance matrix may not be positive definite
Ben Bolker
bbolker at gmail.com
Thu Mar 14 18:24:26 CET 2013
angela.boag at ... <angela.boag at ...> writes:
>
> Hi everyone,
> I'm developing predictive maps of plant species diversity using a
> data set of richness counts from 668 quadrats divided amongst 109
> sites (number of quadrats per site is variable, and depends on site
> size).
> I am modelling richness counts of native and nonnative species using
> a combination of geoclimatic and air photo-derived land use types,
> with the hypothesis that higher nonnative species richness will be
> associated with higher levels of human disturbance, agriculture
> etc., while the opposite is true for native species richness. Site
> is used as a random effect to deal with some of the spatial
> autocorrelation.
> >From an original candidate set of 22 predictor variables I kicked
> out those causing correlations of >0.7 (Spearman), then kicked out a
> further set that had VIFs >5 (after Zuur 2009), leaving 11 predictor
> variables which I standardized using z-scores.
I'm a little bit nervous about dropping predictor variables on
the basis of correlations -- I'd prefer PCA or a penalized approach --
but I agree that this is sometimes a necessary evil. Can you
construct anthropogenic-impact indices that collapse your predictor
into a smaller set?
> I found both the native and nonnative count data was better
> described by a negative binomial distribution than Poisson, and so
> created a global model using glmmADMB, to which I then applied the
> Dredge function form MuMIn to get a model set for averaging:
> #native sp richness
> admb_nr <- glmmadmb(formula=nat_r ~ NR_RD + AG_AR + NR_AG + DEV_AR + SEI_AR +
> NR_SEI + SLP + NOR + EAST + PA_AR +
> TMP_RG + (1|site_code), data=meadow, family="nbinom", zeroInflation = FALSE)
> best_subs_n <- dredge(admb_nr, rank = "AIC", trace = TRUE)
> Dredge runs fine and yields a model list with coefficients and
> standard errors that makes sense, but in my R console (I did trace =
> TRUE and therefore can see each model from Dredge, 2^11 = 2048
> models) the warning "Estimated covariance matrix may not be positive
> definite" appears. The first instance is after the 26th model, then
> it reappears after every 2 or 3 models from then on. For example:
> 233 : glmmadmb(formula = nat_r ~ NOR + NR_RD + NR_SEI + PA_AR + 1,
> data = meadow, family = "nbinom", zeroInflation = FALSE)
> Estimated covariance matrix may not be positive definite
> 0.00017891 0.355147 0.361441 0.37348 0.386851 0.401956 0.446906 0.519341
> 234 : glmmadmb(formula = nat_r ~ AG_AR + NOR + NR_RD + NR_SEI + PA_AR +
> 1, data = meadow, family = "nbinom", zeroInflation = FALSE)
> Estimated covariance matrix may not be positive definite
> 0.000144364 0.405747 0.412888 0.432029 0.452005 0.493285
> Estimated covariance matrix may not be positive definite
> 0.000145352 0.403984 0.411032 0.426545 0.432984 0.460366 0.49558
> I've read that collinearity can cause this, though I feel like I've
> addressed that. Though, as Ben discussed in a reply to one of
> today's messages could it happen if you have variables that are
> derived from the same layer that could sum to 1? Indeed, AG_AR and
> DEV_AR are both derived from the same GIS layer, but there are
> several other polygon types not included in this model so it
> shouldn't be the problem.
> Any insight into what else may cause this warning and what it means
> for my analysis would me much appreciated. Interestingly, when I run
> the model and Dredge for the nonnative richness count data I don't
> get the warning interspersed with each iterative model, but rather
> it appears after the 2048th about 8 times.
It basically means that the likelihood surface is nearly flat
in some direction, i.e. that some combination of parameter effects
is nearly collinear. It's very hard to say in general, but it basically
means you are still overfitting the model somewhat. It also
doesn't mean the model is _necessarily_ wrong, just that you ought
to be suspicious. I think my recommendation would be to try to boil
the model down farther (e.g. by collapsing to indices as suggested above)
and make sure that the _qualitative_ conclusions don't change once
you collapse things to the point where you know you are reliably
fitting the model.
I'm also not entirely happy about dredging, although I can appreciate
that it is essentially a route to penalized regression ...
More information about the R-sig-mixed-models
mailing list