# [R-sig-phylo] comparative analysis using multiple regression of contrasts?

jclaude at univ-montp2.fr jclaude at univ-montp2.fr
Wed May 25 11:38:32 CEST 2011

```Thanks Simon,

this is exactly the answer I was looking for.

for type II sums of squares you are right, however, when there are
multiple factors pics and gls usually still provide different F values
and p-values even if you set the marginality stuff in the analysis of
variance. (they are not much different, but i still wonder why results
are the same with one explanatory variable but different when you
consider several).

As I see it, contrast analyses through the origin are not a so usual
regressions since no intercept is estimated: they however result in
similar output when only one explanatory variable is included.
Although I did not investigate type I and type II error rate when the
response was continuous and the explanatory variable was dummy, I
still guess that there are still something to do for modifying
ancestral character state reconstruction in the contrast analysis for
the dummy variable and computing their contrast: it is hard to believe
that the brownian model will apply to that dummy variable because the
expected variance of such a character can not be properly gaussian,
but certainly more following something that has to see with the
binomial law and logit family link. I wonder if someone has worked in
this direction for contrasts...if not, this is probably something
interesting i would try to investigate.

all the best

julien

Simon Blomberg <s.blomberg1 at uq.edu.au> a écrit :

> On 24/05/11 01:38, Julien Claude wrote:
>> Dear all,
>>
>> I have one factor and several covariates and I would like to know which
>> of these explanatory variables are more likely to explain a response
>> variable in a comparative analysis. The factor is a dummy variable (0-1).
>>
>> At first, my strategy would be to use contrasts of all variables and
>> then producing a type II anova on contrasts, I am however not sure that
>> I am right in doing so with the dummy variable.
>>
>> data(bird.orders)
>> Y<-rnorm(23)
>> X1<-rbinom(23,1,0.5)
>> X2<-rnorm(23)
>>
>> pY<-pic(Y, bird.orders)
>> pX1<-pic(X1, bird.orders)
>> pX2<-pic(X2, bird.orders)
>>
>> library(car)
>> Anova(lm(pY~pX1*pX2-1))
>>
>> Can we directly use the contrast for the dummy variable?, or should we
>> transform this contrast in specifying some special stuff behind the
>> expected ancestral values (The dummy variable probably does not follow a
>> brownian motion model...). The solution may be here but I have no clear
>> idea about how to transform it.
>
> Yes, you can directly use the contrasts of the dummy variable. There
> is no problem with non-Brownian explanatory variables in pic
> regression. The usual regression model assumes all the covariance is
> in the response variable only. The calculation of contrasts for the
> explanatory variable is a necessary step to getting the correct
> parameter estimates. No further meaning should be attached.
> (Although perhaps I hold an extreme view on this.)
>
>> I wonder also how to adopt this strategy by using gls.
>> DF.bird<- data.frame(X1,X2, Y)
>> bm.bird<- corBrownian(phy = bird.orders)
>> m1<- gls(Y~X1*X2, correlation = bm.bird, data = DF.bird)
>> m2<- gls(Y~X2*X1, correlation = bm.bird, data = DF.bird)
>> anova(m1)
>>
>> How to get the mean squares  and variance estimate in pgls ? With a one
>> factor analysis F-values are exactly the same, but when the number of
>> covariates is greater than 1, F values  can differ in some extent, I
>> wonder why.
> that's because by default, the analysis gives you "sequential" sums
> of squares tests (Type I for the SAS people). To get type II
> (marginal) tests, which are not affected by the order of fitting, use
>
> anova(m1, type="marginal")
>
>
>>  In the gls stuff, I understand that incorporating the
>> expected correlation of the response should follow a BM, but this is
>> certainly not true for all the predictors)?
>
> It doesn't matter for the predictor variables. The covariances of
> the predictor variables is not used in the calculation of the gls
> regression. If you want to take the covariances of the predictors
> into account, you need to use an errors-in-variables model. Or you
> could calculate non-central correlations ("correlation through the
>
> Simon.
>> Let me know if this strategy makes sense or if it is flawed....Thanks
>>
>> Julien
>>
>> -------------------
>> Julien CLAUDE
>> Institut des Sciences de l'Evolution, 34095 Montpellier cedex 5
>> Université de Montpellier 2
>>
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>
>
> --
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> School of Biological Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> T: +61 7 3365 2506
> email: S.Blomberg1_at_uq.edu.au
> http://www.uq.edu.au/~uqsblomb/
>
> Policies:
> 1.  I will NOT analyse your data for you.
>
> Statistics is the grammar of science - Karl Pearson.
>
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>
>

```