[R-sig-ME] Problems in using GEEs

esnm2 at mnhn.fr esnm2 at mnhn.fr
Wed Jan 4 17:35:31 CET 2012


Dear list

I?m puzzled with running GEEs with a small dataset. I use R 2.11.1
My data (a subset given below- ?muskerbis?) are seasonal abundances of  
two small vertebrate populations on two distinct sites (gui and coc)  
differing by habitat quality (for the animals) and collected over 10  
years (1999-2010, except 2 years ? 2004 and 2006) that I want to  
correlate with climatic variables. I suspect a serial correlation, and  
tried running a GLS with a corAR1 structure within the ?ile? group,  
but even with ranking the dependent variable (absmay ? summer animal  
abundances), i didn?t reached good distribution in residuals and  
homoskedasticity. The best I could get was using a GLM neg.bin  
approach ? but couldn?t include an AR1. I therefore tried GEEs poisson  
with rounded square rooted dependent variable and ?ile? as cluster  
using yags, geepack, and gee packages. The results are below:

> print(muskerbis)
enr	yr	ile	absmay	abssep	ps
1	1999	bgui	288	89	200.8
2	2000	bgui	157	1	133.6
3	2001	bgui	186	34	145.4
4	2002	bgui	102	9	83.4
5	2003	bgui	27	7	134.2
6	2005	bgui	52	6	129.2
7	2007	bgui	75	6	113.8
8	2008	bgui	272	19	254.6
9	2009	bgui	55	16	111.8
10	2010	bgui	104	7	171
11	1999	acoc	361	83	200.8
12	2000	acoc	173	117	133.6
13	2001	acoc	187	29	145.4
14	2002	acoc	292	48	83.4
15	2003	acoc	382	34	134.2
16	2005	acoc	313	138	129.2
17	2007	acoc	202	3	113.8
18	2008	acoc	312	14	254.6
19	2009	acoc	239	4	111.8
20	2010	acoc	211	58	171

cile=as.factor(ile)	##transform « ile » as a factor

Using GEEPACK:
guicocgeeglmsumps1=geeglm(as.integer(sqrt(absmay))~cile+ps+cile:ps,  
corstr="ar1", id=cile, family=poisson(link = "log"),
na.action=na.omit, data=muskerbis)
summary(guicocgeeglmsumps1)
Call:
geeglm(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
     family = poisson(link = "log"), data = muskerbis, na.action = na.omit,
     id = cile, corstr = "ar1")
Coefficients:
              Estimate   Std.err Wald Pr(>|W|)
(Intercept)  2.621597  0.000000  Inf   <2e-16 ***
cilebgui    -1.002342  0.000000  Inf   <2e-16 ***
ps           0.000935  0.000000  Inf   <2e-16 ***
cilebgui:ps  0.003796  0.000000  Inf   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Estimated Scale Parameters:
             Estimate Std.err
(Intercept)    0.457   0.171
Correlation: Structure = ar1  Link = identity
Estimated Correlation Parameters:
       Estimate Std.err
alpha    0.235   0.093
Number of clusters:   2   Maximum cluster size: 10

Using YAGS:
guicocyagssumtps1<-yags(as.integer(sqrt(absmay))~cile+ps+cile:ps,  
corstr="ar1", cor.met=rank(yr), id=cile, family=poisson,
data=muskerbis, alphainit=0.)
guicocyagssumtps1
YAGS (yet another GEE solver) $Date: 2004/10/22 18:49:23 $
Call:
yags(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
     id = cile, cor.met = rank(yr), family = poisson, corstruct = "ar1",
     alphainit = 0, data = muskerbis)
Regression estimates:
                 est. naive s.e. naive z sand. s.e. sand. z
(Intercept)  2.60549    0.17431   14.95   1.90e-06 1369761
cilebgui    -0.97181    0.28518   -3.41   4.45e-06 -218230
ps           0.00104    0.00095    1.10   1.28e-08   81387
cilebgui:ps  0.00362    0.00146    2.47   2.36e-08  153630
Working correlation model: ar1
alpha est: 0.654
NULL
Pan QIC(R): -3365.819
QLS: 38.87
Rotnitzky-Jewell: 0, 0
yags/R: $Id: yags.R,v 1.5 2004/10/22 18:49:23 stvjc Exp $

Using GEE
guicocgeesumps1<-gee(as.integer(sqrt(absmay))~cile+ps+cile:ps,  
corstr="AR-M", Mv=1, id=cile, family=poisson(link = "log"),
na.action=na.omit, data=muskerbis)
summary(guicocgeesumps1)
GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
  gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
  Link:                      Logarithm
  Variance to Mean Relation: Poisson
  Correlation Structure:     AR-M , M = 1
Call:
gee(formula = as.integer(sqrt(absmay)) ~ cile + ps + cile:ps,
     id = cile, data = muskerbis, na.action = na.omit, family =  
poisson(link = "log"),
     corstr = "AR-M", Mv = 1)
Summary of Residuals:
    Min     1Q Median     3Q    Max
-4.562 -1.750 -0.649  2.364  3.422
Coefficients:
             Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept)  2.60935    0.17772   14.68    9.27e-07  2816177
cilebgui    -0.98027    0.28867   -3.40    2.32e-06  -422088
ps           0.00102    0.00100    1.01    6.49e-09   156718
cilebgui:ps  0.00367    0.00154    2.38    1.31e-08   279242
Estimated Scale Parameter:  0.572
Number of Iterations:  3
Working Correlation
           [,1]     [,2]     [,3]    [,4]    [,5]    [,6]    [,7]      
[,8]     [,9]    [,10]
  [1,] 1.000000 0.369769 0.136729 0.05056 0.01869 0.00691 0.00256  
0.000945 0.000349 0.000129


I believe to have a good design of my model, as geepack ?tells? me I  
have 2 clusters with each 10 obs. residuals rather well behave, so for  
homoskedasticity (not shown). However, I don?t understand why it does  
not produce estimation of the se?s estimates and corresponding Wald  
statistics;
I also do not understand the differences in estimating alpha between  
all packages: 0.235 with geepack, 0.654 with yags, and 0.370 with gee  
??? also for the scale parameter given in gee (0.572) and yags (0.457)  
??
Moreover, although estimates are reliable between packages, it seems  
that naïve z-values from yags and gee perform better than from  
robust-z ???
Finally, could also someone give me some advice on what to use for  
?fixed? effects selection ? more specifically how to compute a  
?working Wald test? or a ?naïve likelihood ratio test? as proposed by  
on page 135 by Ballinger 2004 (Organizational Research Methods 2004 7:  
127-150 DOI: 10.1177/1094428104263672) ? as I may have such situation  
here (ie fewer covariates than obs per group ?)

Many thanks by advance for helping me (I?m at the limit of my  
knowledge in maths and informatics ?) and happy new year to everyone
ben




More information about the R-sig-mixed-models mailing list