[R-sig-phylo] comparative analysis using multiple regression of contrasts?

Simon Blomberg s.blomberg1 at uq.edu.au
Wed May 25 08:26:41 CEST 2011

On 24/05/11 01:38, Julien Claude wrote:
> Dear all,
> I have one factor and several covariates and I would like to know which
> of these explanatory variables are more likely to explain a response
> variable in a comparative analysis. The factor is a dummy variable (0-1).
> At first, my strategy would be to use contrasts of all variables and
> then producing a type II anova on contrasts, I am however not sure that
> I am right in doing so with the dummy variable.
> data(bird.orders)
> Y<-rnorm(23)
> X1<-rbinom(23,1,0.5)
> X2<-rnorm(23)
> pY<-pic(Y, bird.orders)
> pX1<-pic(X1, bird.orders)
> pX2<-pic(X2, bird.orders)
> library(car)
> Anova(lm(pY~pX1*pX2-1))
> Can we directly use the contrast for the dummy variable?, or should we
> transform this contrast in specifying some special stuff behind the
> expected ancestral values (The dummy variable probably does not follow a
> brownian motion model...). The solution may be here but I have no clear
> idea about how to transform it.

Yes, you can directly use the contrasts of the dummy variable. There is 
no problem with non-Brownian explanatory variables in pic regression. 
The usual regression model assumes all the covariance is in the response 
variable only. The calculation of contrasts for the explanatory variable 
is a necessary step to getting the correct parameter estimates. No 
further meaning should be attached. (Although perhaps I hold an extreme 
view on this.)

> I wonder also how to adopt this strategy by using gls.
> DF.bird<- data.frame(X1,X2, Y)
> bm.bird<- corBrownian(phy = bird.orders)
> m1<- gls(Y~X1*X2, correlation = bm.bird, data = DF.bird)
> m2<- gls(Y~X2*X1, correlation = bm.bird, data = DF.bird)
> anova(m1)
> How to get the mean squares  and variance estimate in pgls ? With a one
> factor analysis F-values are exactly the same, but when the number of
> covariates is greater than 1, F values  can differ in some extent, I
> wonder why.
that's because by default, the analysis gives you "sequential" sums of 
squares tests (Type I for the SAS people). To get type II (marginal) 
tests, which are not affected by the order of fitting, use

anova(m1, type="marginal")

>   In the gls stuff, I understand that incorporating the
> expected correlation of the response should follow a BM, but this is
> certainly not true for all the predictors)?

It doesn't matter for the predictor variables. The covariances of the 
predictor variables is not used in the calculation of the gls 
regression. If you want to take the covariances of the predictors into 
account, you need to use an errors-in-variables model. Or you could 
calculate non-central correlations ("correlation through the origin") 
instead of regression.

> Let me know if this strategy makes sense or if it is flawed....Thanks
> for your input
> Julien
> -------------------
> Julien CLAUDE
> Institut des Sciences de l'Evolution, 34095 Montpellier cedex 5
> Université de Montpellier 2
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au

1.  I will NOT analyse your data for you.
2.  Your deadline is your problem

Statistics is the grammar of science - Karl Pearson.

More information about the R-sig-phylo mailing list