[R-sig-phylo] compar.gee- Generalized estimating equations and inflated type 1 errors
Tom Oliver
toliver at ceh.ac.uk
Wed Feb 18 12:45:14 CET 2009
Hi Emmanuel,
Thanks for your post. I can accept that sometimes normal OLS regressions will give non-significant correlations yet after accounting for covariance among species using phylogenetic methods then the traits may show significant correlations. However, I am confused how I can randomly shuffle the response and explanatory variables in the tree, (in theory reducing correlation to the lowest it possibly can be), yet still get significant correlations on more than half of occasions (rather than 1 in 20 at p<0.05). Shouldn't the phylogenetic test be telling me the significance of the correlation given the struture of the tree, rather than the tree structure influencing estimates of significance?
Could it be that there are issues with the distribution of my response variable? As I mentioned it is unimodal with a negative skew, yet according to a Shapiro-Wilk Normality Test it is non-normal. Would this bias the results do you think?
shapiro.test(temp.slope) # signif dif from normal dist
Shapiro-Wilk normality test
data: temp.slope
W = 0.6392, p-value = 6.403e-09
Many thanks,
Tom
>>> Emmanuel Paradis <Emmanuel.Paradis at mpl.ird.fr> 02/18/09 9:52 am >>>
Hi Tom,
We had a discussion related to this topic last year. Here's my main comment:
https://stat.ethz.ch/pipermail/r-sig-phylo/2008-April/000070.html
You may have a look also at other messages in the same thread, of course.
HTH
EP
Tom Oliver <toliver at ceh.ac.uk> a Ã©critÂ :
>
> Hello Helplist,
>
> I was wondering if anyone could shed some light on this problem..
>
> I have been using the compar.gee package (Package ape version 2.2)
> for a comparative analaysis using a binary categorical variable.
> My phylogenetic tree has 42 tips with branch lengths set to 1 and
> the response variable I am using is slightly negatively skewed (i.e.
> non-normal). However, from background reading apparently GEEs are
> fairly robust to such skew.
>
> The output suggests my explanatory variable is highly correlated
> with my response across species (output #1 below). When I run a
> normal anova, however, my result is nowhere near the p<0.05
> significance level (output #2 below). I was quite suspicious about
> the results so I decided to randomly shuffle the response and
> explanatory vectors and repeat the tests 30 times. For the normal
> ANOVA 2 out of 30 results were significant (p<0.05 without
> bonferroni). But for the GEE, 17 out of 30 results were still
> significant.
>
> Clearly something is not quite right! I have also tried using a
> continuous explanatory variable and the GEE model still seems to
> overestimate significance. I can find very little material detailing
> the use or limits of GEEs in R. The R package help page contains no
> information on GEE assumptions etc., nor does the book by
> E.Paradis- Analysis of Phylogenetics and Evolution with R.
>
> I have noticed that if I specificy the GEE regressions to go through
> the origin (output #3) then the results are far less significant.
> Indeed, zero out of the 30 tests are significant. However, in the
> example in the compar.gee help page it suggests that GEE regressions
> NOT through the origin give more comparable results to
> phylogentically independent contrasts (?pic help page example).
>
> Any help you can offer is much appreciated.
> Tom Oliver
>
>
> ########
>
> output #1
>
> print(compar.gee(slope~explanatory,data2,phy=tr2))
> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> running glm to get initial regression estimate
> (Intercept) explanatoryy
> -0.22694925 0.02282178
>
> Call:
> formula: slope ~ explanatory
>
> Number of observations: 42
>
> Model:
> Link: identity
> Variance to Mean Relation: gaussian
>
> Summary of Residuals:
> Min 1Q Median 3Q Max
> -6.1949077 -0.5137222 0.2786307 0.5084768 3.7319044
>
>
> Coefficients:
> Estimate S.E. t Pr(T > |t|)
> (Intercept) 0.3961744 0.7632086 0.5190905 0.6241267392
> explanatoryy -0.8182404 0.1159283 -7.0581610 0.0006200716
>
> Estimated Scale Parameter: 2.100696
> "Phylogenetic" df (dfP): 7.439024
>
> #######'################
>
> output #2
>
> print(summary(lm(slope~explanatory)))
>
> Call:
> lm(formula = slope ~ explanatory)
>
> Residuals:
> Min 1Q Median 3Q Max
> -6.4128 -0.0539 0.1215 0.2905 3.5140
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.22695 0.38631 -0.587 0.560
> explanatoryy 0.02282 0.46490 0.049 0.961
>
> Residual standard error: 1.393 on 40 degrees of freedom
> Multiple R-squared: 6.024e-05, Adjusted R-squared: -0.02494
> F-statistic: 0.00241 on 1 and 40 DF, p-value: 0.961
>
> ##############
>
> output #3
>
> print(compar.gee(slope~explanatory-1,data2,phy=tr2))
> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> running glm to get initial regression estimate
> explanatoryn explanatoryy
> -0.2269493 -0.2041275
>
> Call:
> formula: slope ~ explanatory - 1
>
> Number of observations: 42
>
> Model:
> Link: identity
> Variance to Mean Relation: gaussian
>
> Summary of Residuals:
> Min 1Q Median 3Q Max
> -6.1949077 -0.5137222 0.2786307 0.5084768 3.7319044
>
>
> Coefficients:
> Estimate S.E. t Pr(T > |t|)
> explanatoryn 0.3961744 0.7632086 0.5190905 0.6241267
> explanatoryy -0.4220661 0.7595631 -0.5556695 0.6005236
>
> Estimated Scale Parameter: 2.100696
> "Phylogenetic" df (dfP): 7.439024
>>
> ########
> end
>
>
>
> Tom Oliver
> Biological Records Centre
> Centre for Ecology and Hydrology (CEH)
> Maclean Building,
> Benson Lane,
> Crowmarsh Gifford,
> Wallingford, Oxfordshire, OX10 8BB
> Tel: 01491 692517
>
> --
> This message (and any attachments) is for the recipien...{{dropped:21}}
More information about the R-sig-phylo
mailing list