[R-sig-ME] Problems in using GEEs
esnm2 at mnhn.fr
esnm2 at mnhn.fr
Wed Jan 4 17:35:31 CET 2012
Dear list
I?m puzzled with running GEEs with a small dataset. I use R 2.11.1
My data (a subset given below- ?muskerbis?) are seasonal abundances of
two small vertebrate populations on two distinct sites (gui and coc)
differing by habitat quality (for the animals) and collected over 10
years (1999-2010, except 2 years ? 2004 and 2006) that I want to
correlate with climatic variables. I suspect a serial correlation, and
tried running a GLS with a corAR1 structure within the ?ile? group,
but even with ranking the dependent variable (absmay ? summer animal
abundances), i didn?t reached good distribution in residuals and
homoskedasticity. The best I could get was using a GLM neg.bin
approach ? but couldn?t include an AR1. I therefore tried GEEs poisson
with rounded square rooted dependent variable and ?ile? as cluster
using yags, geepack, and gee packages. The results are below:
> print(muskerbis)
enr yr ile absmay abssep ps
1 1999 bgui 288 89 200.8
2 2000 bgui 157 1 133.6
3 2001 bgui 186 34 145.4
4 2002 bgui 102 9 83.4
5 2003 bgui 27 7 134.2
6 2005 bgui 52 6 129.2
7 2007 bgui 75 6 113.8
8 2008 bgui 272 19 254.6
9 2009 bgui 55 16 111.8
10 2010 bgui 104 7 171
11 1999 acoc 361 83 200.8
12 2000 acoc 173 117 133.6
13 2001 acoc 187 29 145.4
14 2002 acoc 292 48 83.4
15 2003 acoc 382 34 134.2
16 2005 acoc 313 138 129.2
17 2007 acoc 202 3 113.8
18 2008 acoc 312 14 254.6
19 2009 acoc 239 4 111.8
20 2010 acoc 211 58 171
cile=as.factor(ile) ##transform « ile » as a factor
Using GEEPACK:
guicocgeeglmsumps1=geeglm(as.integer(sqrt(absmay))~cile+ps+cile:ps,
corstr="ar1", id=cile, family=poisson(link = "log"),
na.action=na.omit, data=muskerbis)
summary(guicocgeeglmsumps1)
Call:
geeglm(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
family = poisson(link = "log"), data = muskerbis, na.action = na.omit,
id = cile, corstr = "ar1")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 2.621597 0.000000 Inf <2e-16 ***
cilebgui -1.002342 0.000000 Inf <2e-16 ***
ps 0.000935 0.000000 Inf <2e-16 ***
cilebgui:ps 0.003796 0.000000 Inf <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 0.457 0.171
Correlation: Structure = ar1 Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.235 0.093
Number of clusters: 2 Maximum cluster size: 10
Using YAGS:
guicocyagssumtps1<-yags(as.integer(sqrt(absmay))~cile+ps+cile:ps,
corstr="ar1", cor.met=rank(yr), id=cile, family=poisson,
data=muskerbis, alphainit=0.)
guicocyagssumtps1
YAGS (yet another GEE solver) $Date: 2004/10/22 18:49:23 $
Call:
yags(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
id = cile, cor.met = rank(yr), family = poisson, corstruct = "ar1",
alphainit = 0, data = muskerbis)
Regression estimates:
est. naive s.e. naive z sand. s.e. sand. z
(Intercept) 2.60549 0.17431 14.95 1.90e-06 1369761
cilebgui -0.97181 0.28518 -3.41 4.45e-06 -218230
ps 0.00104 0.00095 1.10 1.28e-08 81387
cilebgui:ps 0.00362 0.00146 2.47 2.36e-08 153630
Working correlation model: ar1
alpha est: 0.654
NULL
Pan QIC(R): -3365.819
QLS: 38.87
Rotnitzky-Jewell: 0, 0
yags/R: $Id: yags.R,v 1.5 2004/10/22 18:49:23 stvjc Exp $
Using GEE
guicocgeesumps1<-gee(as.integer(sqrt(absmay))~cile+ps+cile:ps,
corstr="AR-M", Mv=1, id=cile, family=poisson(link = "log"),
na.action=na.omit, data=muskerbis)
summary(guicocgeesumps1)
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Logarithm
Variance to Mean Relation: Poisson
Correlation Structure: AR-M , M = 1
Call:
gee(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
id = cile, data = muskerbis, na.action = na.omit, family =
poisson(link = "log"),
corstr = "AR-M", Mv = 1)
Summary of Residuals:
Min 1Q Median 3Q Max
-4.562 -1.750 -0.649 2.364 3.422
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 2.60935 0.17772 14.68 9.27e-07 2816177
cilebgui -0.98027 0.28867 -3.40 2.32e-06 -422088
ps 0.00102 0.00100 1.01 6.49e-09 156718
cilebgui:ps 0.00367 0.00154 2.38 1.31e-08 279242
Estimated Scale Parameter: 0.572
Number of Iterations: 3
Working Correlation
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[,8] [,9] [,10]
[1,] 1.000000 0.369769 0.136729 0.05056 0.01869 0.00691 0.00256
0.000945 0.000349 0.000129
I believe to have a good design of my model, as geepack ?tells? me I
have 2 clusters with each 10 obs. residuals rather well behave, so for
homoskedasticity (not shown). However, I don?t understand why it does
not produce estimation of the se?s estimates and corresponding Wald
statistics;
I also do not understand the differences in estimating alpha between
all packages: 0.235 with geepack, 0.654 with yags, and 0.370 with gee
??? also for the scale parameter given in gee (0.572) and yags (0.457)
??
Moreover, although estimates are reliable between packages, it seems
that naïve z-values from yags and gee perform better than from
robust-z ???
Finally, could also someone give me some advice on what to use for
?fixed? effects selection ? more specifically how to compute a
?working Wald test? or a ?naïve likelihood ratio test? as proposed by
on page 135 by Ballinger 2004 (Organizational Research Methods 2004 7:
127-150 DOI: 10.1177/1094428104263672) ? as I may have such situation
here (ie fewer covariates than obs per group ?)
Many thanks by advance for helping me (I?m at the limit of my
knowledge in maths and informatics ?) and happy new year to everyone
ben
More information about the R-sig-mixed-models
mailing list