[R-sig-ME] Modelling overdispersed, high-zero Abundance data
fernando barbero
fbarbero at bariloche.inta.gov.ar
Mon Sep 12 17:10:44 CEST 2011
Dear Gillian: there is a book from Alain Zuur called "Mixed effects models
and extensions in ecology with R" that has a chapter (11) that deals with
zero inflated models. What I would do if I were you I Would study the source
of the zeros, may be you don t find animal species in certain sites because
it is not appropriate to find any animal in those sites, for example looking
for wild animals in an urbanized area, if you have that kind of zeros you
should remove them. I have seen this approach also in a paper from 1996
which described the eucalyptus (a tree species) dispersion in Australia
(Austin M & Meyers J, 1996: Current approaches to modeling the environmental
niche of eucalypts: implication for management of forest biodiversity.
Forest Ecology and Management, 85, 95-106.)
Best regards
Fernando
-----Mensaje original-----
En nombre de Gillian Eastwood
Enviado el: lunes, 12 de septiembre de 2011 06:23 a.m.
Para: r-sig-mixed-models at r-project.org
Asunto: [R-sig-ME] Modelling overdispersed, high-zero Abundance data
Dear list,
I was hoping to get some consensus on an analysis Im doing because at
present I seem to going round in circles trying different techniques
depending on who I've last spoken to. Im also abit confused between the
different options of mixed models that there are how to test them. To
explain:
Im trying to model species abundance from dataset that has both spatial
(site, distance to water, distance to village) and temporal (ie.
trap-sites monitored 5+ times on different dates across 5 years, some more
than
others) elements to it. The ideal
outcome would to be able to predict for additional site that werent
sampled, and to include a visual map of hi/med/lo abundance i.e. some
presentable to management.
The
data is count data therefore Im using Poison family of errors (also
tried Neg Bin)
The data seems OverdispersedThere
are a lot of Zeros (since the species often arent at the same site)
therefore I did try zero-inflated (zinb) models, and also
(log(count+1))
transforming the outcome.
Some
covariates seem to have a non-linear relationship to the response var,
so
I wondered if some of the techniques I've considered aren't suitable I
have 1070 observations, 38 sites, 7 habitats, plus climatic covariates.
Note
that not all sites were monitored regularly (some very often/some
hardly
ever). Also I suspect the data is biased depending on the time of year
when those few monitorings took place, i.e. if it took place in summer
there seem some very large sites averages, and if all in winter, zeros.I
have another 300 observations kept aside which I was using to check model
predictions upon,
Firstly I have tried:
GLM with Neg Bin errors: (But this fails to reflect Site differences,and
predicted badly)
N1<-glm.nb( log(Count+1) ~ VegZone + Elevation + Tide + Temp +
RelHum + Rain)
GLM with Quasipoisson structure: (Recently Ive been trying to investigate
the spatial dependency of the data by then creating an
geoR object and using a variogram / likfit spatial model. For this I first
had to convert my dataset to get averages per site (rather than per
obervation) and did this separately for summer / winter
seasons. For one season there was
spatial dependency, but I believe this method fell down because of the high
amount of zeros in the data)
WinterC<-glm(CountWinter~LAT+LONG+VegZone+Elevation+TempW+RainW,family="quas
ipoisson")
Zero-inflated Neg bin GLM (ZINB)
(this seemed to work but I wouldn't mind some advice as to how to best
decide which variables should go in the zero component. At the moment i'm
just guessing factors that make biological sense, but several other
combinations seem to work ok also) The problem here seems that the
predictions based on
another dataset were a fair bit higher than observed in the field.
Library(pscl)f1<-formula(Count~VegZone+Elevation+Month+
Tide+Temp | RelHum+Rain) Zip1<-zeroinfl(f1,dist="poisson",
link="logit") Znb1<-zeroinfl(f1,dist="negbin", link="logit")(Vuong
test showed * for ZeroNegBin over StdNegBin and lrtest(Zip1,Znb1) showed
NegBin better than Poisson) > summary(Znb1)
Call:zeroinfl(formula
= f1, dist = "negbin", link = "logit") Pearson
residuals: Min
1Q Median 3Q
Max -0.58531
-0.49502 -0.40727 -0.07085 18.12045 Count
model coefficients (negbin with log link): Estimate Std.
Error z value
Pr(>|z|) (Intercept) -4.721674 1.354796
-3.485 0.000492 ***VegZone1 0.404217 0.537177
0.752 0.451760 VegZone2 2.221035 0.567481
3.914 9.08e-05 ***VegZone3 1.539401 0.567190
2.714 0.006646 ** VegZone4 -2.680181
0.602345 -4.450 8.60e-06 ***VegZone5 -0.288254
0.229131 -1.258 0.208381 VegZone6 2.301781
0.596701 3.858 0.000115 ***Elevation 0.008885 0.001920
4.626 3.72e-06 ***Month -0.178041 0.026461
-6.728 1.71e-11 ***Tide 0.832042 0.305550
2.723 0.006467 ** Temp 0.182989 0.039613
4.619 3.85e-06 ***Log(theta) -1.038962 0.087121 -11.926 < 2e-16
*** Zero-inflation
model coefficients (binomial with logit link): Estimate Std.
Error z value
Pr(>|z|) (Intercept) 7.73138
3.05750 2.529 0.01145 * RelHum -0.10398 0.03650
-2.848 0.00439 **Rain -0.05182
0.02607 -1.987 0.04689 * ---Signif.
codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Theta = 0.3538
Number of iterations in BFGS optimization: 32 Log-likelihood:
-2606 on 15 Df lrtest(Znb1) -
Likelihood ratio test #Df LogLik
Df Chisq Pr(>Chisq) 1 15
-2605.6 2 3 -2773.4
-12 335.53 < 2.2e-16 ***
LMER (with
Site and Year as a random factor, plus an Obs level to deal with
Overdispersion)
(As there isn't a predict function in lmer I wrote my own based on
coefficients of fixef(); the model didnt predict well even against itself,
and a plot(fitted,residuals) indicated clustering peaking around zero.
Obs<-1:nrow(X)Mod1<-lmer(Count~VegZone+Tide+RelHum+Rain+Temp+(1|Obs)+(1|SITE
)+(1|Year),family=poisson)summary(Mod1)
Generalized
linear mixed model fit by the Laplace
approximation Formula:
Count ~ VegZone + Elevation + Tide + RelHum + Rain + Temp + (1 | Obs) +
(1 | SITE) + (1 | Year) AIC
BIC logLik deviance 2959 3034
-1465 2929Random
effects: Groups Name Variance Std.Dev. Obs
(Intercept) 2.80118 1.67367 SITE
(Intercept) 0.77605 0.88094 Year
(Intercept) 1.20464 1.09756 Number
of obs: 1073, groups: Obs, 1073; SITE, 38; Year, 6 Fixed
effects: Estimate Std. Error z value
Pr(>|z|) (Intercept) -19.675884 2.387821
-8.240 < 2e-16 ***VegZone1 -0.758027 1.226631
-0.618 0.537 VegZone2 1.380153 1.393533
0.990 0.322 VegZone3 1.010145 1.344949
0.751 0.453 VegZone4
-1.432985 1.228785 -1.166
0.244 VegZone5
0.099095 0.661508 0.150
0.881 VegZone6 -0.736065 1.312398
-0.561 0.575 Elevation 0.004610 0.004037
1.142 0.254 Tide 0.479253 0.342998
1.397 0.162 RelHum 0.063506 0.016140
3.935 8.33e-05 ***Rain
0.003029 0.000576 5.259 1.45e-07 ***Temp 0.527903 0.039350
13.416 < 2e-16 ***---Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Correlation
of Fixed Effects: (Intr) Veg1 Veg2
Veg3 Veg4 Veg5
Veg6 Elevtn Tide RelHum
RainVeg1 -0.496
Veg2 -0.456
0.852
Veg3
-0.490
0.887 0.857
Veg4
0.395 -0.660 -0.649 -0.676
Veg5 -0.311
0.526 0.497 0.518 -0.198 Veg6
-0.498
0.878 0.848 0.883 -0.665
0.515
Elevation -0.518
0.896 0.870 0.907 -0.826
0.422 0.894 Tide -0.194
0.006 0.008 0.008 -0.027
0.037 0.007 0.019 RelHum -0.693
0.012 -0.014 0.001 -0.024
-0.014 0.017 0.009 -0.095 Rain 0.138
-0.002 -0.004 -0.002 0.014 -0.014 -0.002
-0.030 -0.155 -0.053 Temp -0.567 -0.039 -0.058 -0.046 0.025
0.042 -0.036 -0.017 -0.021 0.321
-0.164
glmmPQL (because of the Overdispersion, but then the
plots, as in the Owls data egs confirm the zero-inflatedness)
QP1<-
glmmPQL(Count ~ x+y+z, random = ~1 |SITE, family =quasipoisson, data = X)
INLA (So to try and deal with Zero-inflatedness and Overdispersion and
Random factors - even more recently I am looking at this as I was advised
this could cope with the zero-inflated aspect; the NegBin model wont run,
but the Poisson will. I understand I have to check that the quantile
intervals include 0 to be
significant. With this INLA model should the random factors be those to
represent the Zero's)
formula1<-Count~Elevation+Tide+RelHum+Rain+Temp+f(SITE,model="iid")+f(Year,m
odel="iid")
Z1<-inla(formula1,family="zeroinflatednbinomial1",data=X)
Warning
message: In is.vector(object) : NAs
introduced by coercion
Z2<-inla(formula1,family="zeroinflatedpoisson0",data=X)summary(Z2)
Call:"inla(formula = formula1, family =
\"zeroinflatedpoisson0\", data = X)" Time used: Pre-processing Running
inla Post-processing Total
0.6250000 7.3440001 0.2029998 8.1719999 Fixed effects:
mean sd
0.025quant 0.5quant 0.975quant kld(Intercept) -4.8645490672
0.5116398678 -5.872449e+00
-4.864323731 -3.8591231353 1.441839e-03Elevation 0.0019088062
0.0009895353
-4.055322e-05 0.001906647 0.0038696403 3.746575e-04Tide
0.4133481865 0.0583345762 2.987825e-01
0.413348241 0.5277415863
2.357353e-06RelHum 0.0201337858 0.0027565909 1.472445e-02
0.020132161 0.0255446762
3.705232e-05Rain 0.0002081704 0.0000749166 5.988189e-05
0.000208579 0.0003539284
1.432702e-07Temp 0.1722034238 0.0073888403 1.577326e-01
0.172188544 0.1867410550
5.566846e-05 Random effects:Name
Model Max KLD SITE IID model
Year IID model
Model hyperparameters:
mean sd
0.025quant 0.5quant 0.975quantZero-Probability parameter for zero-inflated
Poisson_0
0.46385 0.01501 0.43458 0.46376 0.49352
Precision for SITE 0.96005
0.23775 0.56946 0.93505 1.49789
Precision for Year 3.04348
1.63155 0.89671 2.72199 7.09648
Expected number of effective parameters(std dev):
44.03(0.195)Number of equivalent replicates : 24.37 Marginal Likelihood:
-8250.01 Warning: Interpret the marginal likelihood with care if the prior
model is improper.
This INLA model seems a good option
but Im unsure how to check the quality or make predictions.
And then GLMER and MCMCglmm seem options also.
I have a rather short timeframe by which to complete this. I think my data
is quite hard to explain on the available factors anyway but I was hoping
the list might be able indicate which of these I seem to be going in the
right direction to persevere with.
Many thanks in advance for any advice that you might offer
Gillian
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list