[R-sig-eco] compar.gee- Generalized estimating equations and inflated type 1 errors
Tom Oliver
toliver at ceh.ac.uk
Tue Feb 17 17:27:28 CET 2009
Hello Helplist,
I was wondering if anyone could shed some light on this problem..
I have been using the compar.gee package (Package ape version 2.2) for a comparative analaysis using a binary categorical variable.
My phylogenetic tree has 42 tips with branch lengths set to 1 and the response variable I am using is slightly negatively skewed (i.e. non-normal). However, from background reading apparently GEEs are fairly robust to such skew.
The output suggests my explanatory variable is highly correlated with my response across species (output #1 below). When I run a normal anova, however, my result is nowhere near the p<0.05 significance level (output #2 below). I was quite suspicious about the results so I decided to randomly shuffle the response and explanatory vectors and repeat the tests 30 times. For the normal ANOVA 2 out of 30 results were significant (p<0.05 without bonferroni). But for the GEE, 17 out of 30 results were still significant.
Clearly something is not quite right! I have also tried using a continuous explanatory variable and the GEE model still seems to overestimate significance. I can find very little material detailing the use or limits of GEEs in R. The R package help page contains no information on GEE assumptions etc., nor does the book by E.Paradis- Analysis of Phylogenetics and Evolution with R.
I have noticed that if I specificy the GEE regressions to go through the origin (output #3) then the results are far less significant. Indeed, zero out of the 30 tests are significant. However, in the example in the compar.gee help page it suggests that GEE regressions NOT through the origin give more comparable results to phylogentically independent contrasts (?pic help page example).
Any help you can offer is much appreciated.
Tom Oliver
########
output #1
print(compar.gee(slope~explanatory,data2,phy=tr2))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) explanatoryy
-0.22694925 0.02282178
Call:
formula: slope ~ explanatory
Number of observations: 42
Model:
Link: identity
Variance to Mean Relation: gaussian
Summary of Residuals:
Min 1Q Median 3Q Max
-6.1949077 -0.5137222 0.2786307 0.5084768 3.7319044
Coefficients:
Estimate S.E. t Pr(T > |t|)
(Intercept) 0.3961744 0.7632086 0.5190905 0.6241267392
explanatoryy -0.8182404 0.1159283 -7.0581610 0.0006200716
Estimated Scale Parameter: 2.100696
"Phylogenetic" df (dfP): 7.439024
#######'################
output #2
print(summary(lm(slope~explanatory)))
Call:
lm(formula = slope ~ explanatory)
Residuals:
Min 1Q Median 3Q Max
-6.4128 -0.0539 0.1215 0.2905 3.5140
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.22695 0.38631 -0.587 0.560
explanatoryy 0.02282 0.46490 0.049 0.961
Residual standard error: 1.393 on 40 degrees of freedom
Multiple R-squared: 6.024e-05, Adjusted R-squared: -0.02494
F-statistic: 0.00241 on 1 and 40 DF, p-value: 0.961
##############
output #3
print(compar.gee(slope~explanatory-1,data2,phy=tr2))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
explanatoryn explanatoryy
-0.2269493 -0.2041275
Call:
formula: slope ~ explanatory - 1
Number of observations: 42
Model:
Link: identity
Variance to Mean Relation: gaussian
Summary of Residuals:
Min 1Q Median 3Q Max
-6.1949077 -0.5137222 0.2786307 0.5084768 3.7319044
Coefficients:
Estimate S.E. t Pr(T > |t|)
explanatoryn 0.3961744 0.7632086 0.5190905 0.6241267
explanatoryy -0.4220661 0.7595631 -0.5556695 0.6005236
Estimated Scale Parameter: 2.100696
"Phylogenetic" df (dfP): 7.439024
>
########
end
Tom Oliver
Biological Records Centre
Centre for Ecology and Hydrology (CEH)
Maclean Building,
Benson Lane,
Crowmarsh Gifford,
Wallingford, Oxfordshire, OX10 8BB
Tel: 01491 692517
--
This message (and any attachments) is for the recipient ...{{dropped:6}}
More information about the R-sig-ecology
mailing list