[R-sig-phylo] PIC vs. PGLS
Emmanuel Paradis
Emmanuel.Paradis at ird.fr
Mon Sep 27 19:08:26 CEST 2010
Enrico Rezende wrote on 27/09/2010 18:26:
> Hi everybody,
> I am able replicate the results of independent contrasts and PGLS
> employing R (mapping the intercept to the phylogenetic means). I also
> recall comparing results of PDAP employing a star phylogeny against a
> regular regression in SPSS for Windows. And as far as I can tell, this
> applies to any phylogeny, not only ultrametric ones. So it is indeed
> strange that Emmanuel's code does not obtain the same values.
That's because the VCV matrix is transformed to a correlation matrix by
gls(), so the (residual) variance is homogeneous among observations.
Simon pointed this out in a post on this list some months ago. gls() can
model heterogeneous variance with its 'weights' option. Take some random
data (with a non-ultrametric tree):
> library(ape)
> x <- rnorm(10)
> y <- rnorm(10)
> tr <- rtree(10)
> co <- corBrownian(1, tr)
Calculate the PICs and do lm through the origin:
> py <- pic(y, tr)
> px <- pic(x, tr)
> lm(py ~ px - 1)$coef
px
-0.6132837
The GLS (assuming homoscedasticity):
> gls(y ~ x, correlation=co)$coef
(Intercept) x
0.4126305 -0.5314185
Extract the expected variances from the VCV matrix and build a var
function with a fixed term ('offset' is standard to add a constant term
in a formula):
> v <- diag(vcv(co))
> vf <- varFixed(~ offset(v))
Finally:
> gls(y ~ x, correlation=co, weights=vf)$coef
(Intercept) x
0.4326779 -0.6132837
Cannot be closer to the PIC model. And both models have the same number
of df (4 - 2 for the PICs; 5 - 3 for the GLS).
Emmanuel
> Cheers,
> Enrico
>
>
>
>
> El 9/27/10 6:09 PM, tgarland at ucr.edu escribió:
>> Hi All,
>>
>> Responding to two of Emmanuel's points.
>>
>>
>>> You can find in my book the R code to get exactly the same coefficients
>>> with PICs and PGLS. This works as long as the tree is ultrametric (for
>>> equal variance). It's not a formal proof, of course, but a strong
>>> suspicion.
>>>
>> You also get the same numbers if you compute standardized independent
>> contrasts in, say, PDAP of Mesquite and compare with the PGLS results
>> from our Regressionv2.m Matlab program (Lavin et al., 2008). And,
>> this applies not only for ultrametric trees (contemporaneous tips).
>> Emmanuel, does your R code not get the same numbers (e.g., regression
>> equations) if the trees are non-ultrametric? Why not?
>>
>>
>>> I share some of Simon's concern but for a different reason. I see no
>>> problem to build complex models (and what Ted describes seems quite
>>> exciting) as long as you do that in a hierarchical and consistent way
>>> leading to the ability to test whether your added complexity is
>>> statistically significant. For instance, it is not clear to me how to
>>> test that different predictors have distinct correlation structures.
>>>
>> Those are good points. As a heuristic, you can always do a univariate
>> fit of different independent variables and look at the lm maximum
>> likelihoods on a star and on hierarchical trees with different sets of
>> branch lengths. If one independent variable showed a higher ln ML on
>> a star whereas another showed a higher lm MN on a hierarchical tree,
>> then this would be evidence that their correlations structures are
>> different. This could be done with our PHYSIG_LL.m Matlab program
>> (from Blomberg et al., 2003) or with Regressionv2.m (just pretend you
>> are doing a regression with the trait of interest as the dependent
>> variable and tell it you have no independent variables). I know that
>> does not give a statistical test, but it would be an indicator. I am
>> sure you could figure out a way to do a formal test via randomization
>> or simulation procedures. But, maybe there is also a good way through
>> what Simon mentioned:
>>
>>
>>> We have developed Bayesian methods to fit ME models for
>>> phylogenetic data in OpenBUGS and JAGS (submitted to Evolution).
>>>
>> Simon, I'd like to see the paper if you are willing to share at this
>> stage.
>>
>> Cheers,
>> Ted
>>
>> ---- Original message ----
>>
>> Date: Mon, 27 Sep 2010 17:39:08 +0200
>> From: Emmanuel Paradis<Emmanuel.Paradis at ird.fr>
>> Subject: Re: [R-sig-phylo] PIC vs. PGLS
>> To: Simon Blomberg<s.blomberg1 at uq.edu.au>
>> Cc: r-sig-phylo at r-project.org
>>
>> >Hi all,
>> >
>> >Simon Blomberg wrote on 27/09/2010 08:10:
>> >> Hi Anne, Ted, Liam et al!,
>> >>
>> >>
>> >> On 25/09/10 02:51, tgarland at ucr.edu wrote:
>> >>> Hi Anne,
>> >>>
>> >>> I am going to put this back online so others might
>> benefit or chime in.
>> >>> If you literally want to do correlations, then
>> phylogenetically
>> >>> independent contrasts may be easiest. You just compute
>> the
>> >>> standardized contrasts for each of your traits, then
>> compute
>> >>> correlations (through the origin).
>> >>>
>> >>
>> >> The correct statistical term is "noncentral
>> correlation." Correlation
>> >> "through the origin" makes no sense, as there is no
>> line involved here.
>> >> Nothing is going through the origin. (Although I guess
>> we all know what
>> >> we mean.)
>> >>> You can also handle interactions with contrasts. You
>> just need to
>> >>> compute the usual 0-1 dummy variables, compute
>> contrasts of them,
>> >>> etc. This was discussed here:
>> >>>
>> >>> Garland, T., Jr., P. H. Harvey, and A. R. Ives. 1992.
>> Procedures for
>> >>> the analysis of comparative data using
>> phylogenetically independent
>> >>> contrasts. Systematic Biology 41:18-32.
>> >>> Clobert, J., T. Garland, Jr., and R. Barbault. 1998.
>> The evolution of
>> >>> demographic tactics in lizards: a test of some
>> hypotheses concerning
>> >>> life history evolution. Journal of Evolutionary
>> Biology 11:329-364.
>> >>>
>> >>> However, then you are most likely going to want to be
>> back into
>> >>> regression models. At this point, I refer you back to
>> what Liam
>> >>> Revell wrote.
>> >>>
>> >>> Another interesting technical point. In general, PIC
>> and PGLS are the
>> >>> same, especially if you stick with a simple Brownian
>> motion model of
>> >>> character evolution. However, their complete
>> mathematical identity
>> >>> has not, to my knowledge, been proven.
>> >> I have a proof. I have a paper submitted to Sys. Biol.
>> on the topic.
>> >
>> >You can find in my book the R code to get exactly the
>> same coefficients
>> >with PICs and PGLS. This works as long as the tree is
>> ultrametric (for
>> >equal variance). It's not a formal proof, of course, but
>> a strong suspicion.
>> >
>> >>> And, some things that can easily be done with PUC
>> cannot easily be
>> >>> done with PGLS, partly because the proofs don't yet
>> exist but also
>> >>> because the code to do it does not exist (so far as I
>> am aware). For
>> >>> example, with PIC you can easily compute contrasts on
>> a hierarchical
>> >>> tree for one trait, but on star with another. Or, you
>> can use
>> >>> different sets of branch lengths for different traits
>> if that made
>> >>> biological or statistical sense to do. We have used
>> this ability
>> >>> sometimes when we have "nuisance variables" that need
>> to be
>> >>> incorporated in our set of independent variables in a
>> multiple
>> >>> regression model. Given that the nuisance variable is
>> not
>> >>> phylogenetically inherited, it makes sense to compute
>> contrasts on a
>> >>> star. We have done t!
>> >>>
>> >>
>> >> From a biological point of view, I understand why you
>> would want to do
>> >> that. But be aware that this means the resulting
>> estimator will not have
>> >> minimum variance. Since the minimum variance property
>> is a very useful
>> >> property of PIC/GLS, I don't recommend Ted's approach.
>> Keep the same
>> >> tree for response AND explanatory variables. That way
>> you are using the
>> >> Minimum Variance Unbiased Estimator (MVUE). If you
>> really want to use
>> >> different trees/branch lengths for explanatory and
>> response variables,
>> >> then you should ditch GLS/PIC and move to a Measurement
>> Error (Errors in
>> >> Variables) model that explicitly incorporates
>> covariance in explanatory
>> >> variables. We have developed Bayesian methods to fit ME
>> models for
>> >> phylogenetic data in OpenBUGS and JAGS (submitted to
>> Evolution). A good
>> >> introduction to ME models is Carrol, Ruppert,
>> Stafanski, Crainiceanu
>> >> (2006). Measurement Error in Nonlinear Models. A Modern
>> Perspective.
>> >> Chapman& Hall.
>> >
>> >I share some of Simon's concern but for a different
>> reason. I see no
>> >problem to build complex models (and what Ted describes
>> seems quite
>> >exciting) as long as you do that in a hierarchical and
>> consistent way
>> >leading to the ability to test whether your added
>> complexity is
>> >statistically significant. For instance, it is not clear
>> to me how to
>> >test that different predictors have distinct correlation
>> structures.
>> >
>> >Emmanuel
>> >
>> >>> hi!
>> >>> !
>> >>> s sort of thing here:
>> >>>
>> >>> Wolf, C. M., T. Garland, Jr., and B. Griffith. 1998.
>> Predictors of
>> >>> avian and mammalian translocation success: reanalysis
>> with
>> >>> phylogenetically independent contrasts. Biological
>> Conservation
>> >>> 86:243-255.
>> >>> Perry, G., and T. Garland, Jr. 2002. Lizard home
>> ranges revisited:
>> >>> effects of sex, body size, diet, habitat, and
>> phylogeny. Ecology
>> >>> 83:1870-1885.
>> >>>
>> >>> All of this then relates back to what Liam wrote.
>> Anyway, none of
>> >>> these questions are entirely simple, nor the answers
>> entirely
>> >>> straightforward.
>> >>>
>> >> If the answers were straightforward, it wouldn't be so
>> much fun!
>> >>
>> >> Cheers,
>> >>
>> >> Simon.
>> >>
>> >>> Cheers,
>> >>> Ted
>> >>>
>> >>>
>> >>> ---- Original message ----
>> >>>
>> >>> Date: Fri, 24 Sep 2010 17:56:46 +0200
>> >>> From: Anne Kempel<kempel at ips.unibe.ch>
>> >>> Subject: Re: [R-sig-phylo] (no subject)
>> >>> To: tgarland at ucr.edu
>> >>>
>> >>> >Hi Ted,
>> >>> >I guess I do regression models, since this GLS method
>> >>> only does
>> >>> >regressions (I have an interaction term, thats why I
>> use
>> >>> the
>> >>> >phylogenetic GLS and not the PIC method). But
>> >>> biologically, for those
>> >>> >species traits I am analysing each one could be
>> dependent
>> >>> - e.g. the
>> >>> >plants growth rate might depend on plant resistance,
>> but
>> >>> plant
>> >>> >resistance also depends on the growth rate.
>> >>> >All the best,
>> >>> >anne
>> >>> >
>> >>> >
>> >>> >On 24.09.2010 17:33, tgarland at ucr.edu wrote:
>> >>> >> Are you doing regression models or correlations per
>> se?
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Ted
>> >>> >>
>> >>>
>> >>> Date: Fri 24 Sep 08:43:52 PDT 2010
>> >>> From: r-sig-phylo-bounces at r-project.org Add To Address
>> Book | This is
>> >>> Spam (on behalf of "Liam J.
>> Revell"<lrevell at nescent.org>)
>> >>> Subject: Re: [R-sig-phylo] (no subject)
>> >>> To: Anne Kempel<kempel at ips.unibe.ch>
>> >>> Cc: r-sig-phylo at r-project.org
>> >>>
>> >>> Hi Anne,
>> >>>
>> >>> You should build your model based on your scientific
>> hypothesis - not on
>> >>> which trait shows phylogenetic signal. However, GLS
>> "corrects for"
>> >>> non-independence in the residual error of y given X -
>> non-independence
>> >>> which may be due to (for instance) phylogenetic
>> history. Incidentally,
>> >>> if our observations for X are non-independent due to
>> the phylogeny, but
>> >>> the residual error in y given X is uncorrelated, than
>> GLS is not
>> >>> necessary (and will actually give us an estimate with
>> inflated variance).
>> >>>
>> >>> I have a paper about exactly this topic that was
>> recently published
>> >>> online in the new journal "Methods in Ecology and
>> Evolution." The
>> >>> citation is:
>> >>> Revell, L. J. 2010. Phylogenetic signal and linear
>> regression on species
>> >>> data. /Methods in Ecology and Evolution/ Online Early
>> View.
>> >>> And the article can be found at the following URL:
>> >>> http://dx.doi.org/10.1111/j.2041-210X.2010.00044.x or
>> on my website (URL
>> >>> below).
>> >>>
>> >>> I hope this is of some help.
>> >>>
>> >>> - Liam
>> >>>
>> >>> Liam J. Revell
>> >>> NESCent, Duke University
>> >>> web: http://anolis.oeb.harvard.edu/~liam/
>> >>> NEW email: lrevell at nescent.org
>> >>>
>> >>> _______________________________________________
>> >>> R-sig-phylo mailing list
>> >>> R-sig-phylo at r-project.org
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> >>>
>> >>
>> >
>> >--
>> >Emmanuel Paradis
>> >IRD, Montpellier, France
>> > ph: +33 (0)4 67 16 64 47
>> > fax: +33 (0)4 67 16 64 40
>> >http://ape.mpl.ird.fr/
>> >
>> >_______________________________________________
>> >R-sig-phylo mailing list
>> >R-sig-phylo at r-project.org
>> >https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>
>>
>
>
--
Emmanuel Paradis
IRD, Montpellier, France
ph: +33 (0)4 67 16 64 47
fax: +33 (0)4 67 16 64 40
http://ape.mpl.ird.fr/
More information about the R-sig-phylo
mailing list