[R-sig-phylo] comparative analysis using multiple regression of contrasts?

Joe Felsenstein joe at gs.washington.edu
Wed May 25 15:10:04 CEST 2011

Folks --

Julien Clause wrote:

> for type II sums of squares you are right, however, when there are  
> multiple factors pics and gls usually still provide different F  
> values and p-values even if you set the marginality stuff in the  
> analysis of variance. (they are not much different, but i still  
> wonder why results are the same with one explanatory variable but  
> different when you consider several).
>  As I see it, contrast analyses through the origin are not a so  
> usual regressions since no intercept is estimated: they however  
> result in similar output when only one explanatory variable is  
> included. Although I did not investigate type I and type II error  
> rate when the response was continuous and the explanatory variable  
> was dummy, I still guess that there are still something to do for  
> modifying ancestral character state reconstruction in the contrast  
> analysis for the dummy variable and computing their contrast: it is  
> hard to believe that the brownian model will apply to that dummy  
> variable because the expected variance of such a character can not  
> be properly gaussian, but certainly more following something that  
> has to see with the binomial law and logit family link. I wonder if  
> someone has worked in this direction for contrasts...if not, this  
> is probably something interesting i would try to investigate.

I'm trying to understand what sort of model everyone is talking about  
here (not the details of how to do it in R but what process is assumed).

So the "independent" variables are all characters of these species,  
and they change along the tree by a process of multivariate Brownian  
motion (with, of course, covariation of their change due to genetic  
correlations and selective correlation?   If you have not yet heard  
of "selective correlation" please check chapter 24 of my book or  
review articles of mine going back to 1988).

OK.   Now what about the response variable (or variables)?  Is it  
assumed that they are also characters that covary with the other  
variables?  In which case, why treat them specially?  Why not just  
infer the mutual covariance matrix of phylogenetic change of all  
variables and use that?  Because in that model the means and  
covariances of the full set of characters are the sufficient  
statistics, and anything else interesting that you want to estimate  
must just be functions of them.

Or is it assumed that the response variable is not a character of the  
species?  That the error of its regression on the other variables is  

Regressing on reconstructed ancestral states seems not correct in any  
of these cases, as those are not observations and are a linear  
function of the observed characters at the tip of the tree, in which  
case one is just making life harder by regressing on that  
reconstruction rather than on those tip phenotypes.  But even that  
seems wrong if the response variable is a character of the species,  
as I just noted.

Someone please educate me.

Joe Felsenstein, joe at gs.washington.edu
  Dept. of Genome Sciences, Univ. of Washington
  Box 355065, Seattle, WA 98195-5065 USA

More information about the R-sig-phylo mailing list