[R-sig-phylo] comparative analysis using multiple regression of contrasts?
Joe Felsenstein
joe at gs.washington.edu
Wed May 25 15:10:04 CEST 2011
Folks --
Julien Clause wrote:
> for type II sums of squares you are right, however, when there are
> multiple factors pics and gls usually still provide different F
> values and p-values even if you set the marginality stuff in the
> analysis of variance. (they are not much different, but i still
> wonder why results are the same with one explanatory variable but
> different when you consider several).
>
> As I see it, contrast analyses through the origin are not a so
> usual regressions since no intercept is estimated: they however
> result in similar output when only one explanatory variable is
> included. Although I did not investigate type I and type II error
> rate when the response was continuous and the explanatory variable
> was dummy, I still guess that there are still something to do for
> modifying ancestral character state reconstruction in the contrast
> analysis for the dummy variable and computing their contrast: it is
> hard to believe that the brownian model will apply to that dummy
> variable because the expected variance of such a character can not
> be properly gaussian, but certainly more following something that
> has to see with the binomial law and logit family link. I wonder if
> someone has worked in this direction for contrasts...if not, this
> is probably something interesting i would try to investigate.
I'm trying to understand what sort of model everyone is talking about
here (not the details of how to do it in R but what process is assumed).
So the "independent" variables are all characters of these species,
and they change along the tree by a process of multivariate Brownian
motion (with, of course, covariation of their change due to genetic
correlations and selective correlation? If you have not yet heard
of "selective correlation" please check chapter 24 of my book or
review articles of mine going back to 1988).
OK. Now what about the response variable (or variables)? Is it
assumed that they are also characters that covary with the other
variables? In which case, why treat them specially? Why not just
infer the mutual covariance matrix of phylogenetic change of all
variables and use that? Because in that model the means and
covariances of the full set of characters are the sufficient
statistics, and anything else interesting that you want to estimate
must just be functions of them.
Or is it assumed that the response variable is not a character of the
species? That the error of its regression on the other variables is
i.i.d.?
Regressing on reconstructed ancestral states seems not correct in any
of these cases, as those are not observations and are a linear
function of the observed characters at the tip of the tree, in which
case one is just making life harder by regressing on that
reconstruction rather than on those tip phenotypes. But even that
seems wrong if the response variable is a character of the species,
as I just noted.
Someone please educate me.
Joe
----
Joe Felsenstein, joe at gs.washington.edu
Dept. of Genome Sciences, Univ. of Washington
Box 355065, Seattle, WA 98195-5065 USA
More information about the R-sig-phylo
mailing list