[R-sig-phylo] PIC vs. PGLS

Mon Sep 27 19:08:26 CEST 2010

```Enrico Rezende wrote on 27/09/2010 18:26:
> Hi everybody,
> I am able replicate the results of independent contrasts and PGLS
> employing R (mapping the intercept to the phylogenetic means). I also
> recall comparing results of PDAP employing a star phylogeny against a
> regular regression in SPSS for Windows. And as far as I can tell, this
> applies to any phylogeny, not only ultrametric ones. So it is indeed
> strange that Emmanuel's code does not obtain the same values.

That's because the VCV matrix is transformed to a correlation matrix by
gls(), so the (residual) variance is homogeneous among observations.
Simon pointed this out in a post on this list some months ago. gls() can
model heterogeneous variance with its 'weights' option. Take some random
data (with a non-ultrametric tree):

> library(ape)
> x <- rnorm(10)
> y <- rnorm(10)
> tr <- rtree(10)
> co <- corBrownian(1, tr)

Calculate the PICs and do lm through the origin:

> py <- pic(y, tr)
> px <- pic(x, tr)
> lm(py ~ px - 1)\$coef
px
-0.6132837

The GLS (assuming homoscedasticity):

> gls(y ~ x, correlation=co)\$coef
(Intercept)           x
0.4126305  -0.5314185

Extract the expected variances from the VCV matrix and build a var
function with a fixed term ('offset' is standard to add a constant term
in a formula):

> v <- diag(vcv(co))
> vf <- varFixed(~ offset(v))

Finally:

> gls(y ~ x, correlation=co, weights=vf)\$coef
(Intercept)           x
0.4326779  -0.6132837

Cannot be closer to the PIC model. And both models have the same number
of df (4 - 2 for the PICs; 5 - 3 for the GLS).

Emmanuel

> Cheers,
> Enrico
>
>
>
>
> El 9/27/10 6:09 PM, tgarland at ucr.edu escribió:
>> Hi All,
>>
>> Responding to two of Emmanuel's points.
>>
>>
>>> You can find in my book the R code to get exactly the same coefficients
>>> with PICs and PGLS.  This works as long as the tree is ultrametric (for
>>> equal variance).  It's not a formal proof, of course, but a strong
>>> suspicion.
>>>
>> You also get the same numbers if you compute standardized independent
>> contrasts in, say, PDAP of Mesquite and compare with the PGLS results
>> from our Regressionv2.m Matlab program (Lavin et al., 2008).  And,
>> this applies not only for ultrametric trees (contemporaneous tips).
>> Emmanuel, does your R code not get the same numbers (e.g., regression
>> equations) if the trees are non-ultrametric?  Why not?
>>
>>
>>> I share some of Simon's concern but for a different reason.  I see no
>>> problem to build complex models (and what Ted describes seems quite
>>> exciting) as long as you do that in a hierarchical and consistent way
>>> statistically significant.  For instance, it is not clear to me how to
>>> test that different predictors have distinct correlation structures.
>>>
>> Those are good points.  As a heuristic, you can always do a univariate
>> fit of different independent variables and look at the lm maximum
>> likelihoods on a star and on hierarchical trees with different sets of
>> branch lengths.  If one independent variable showed a higher ln ML on
>> a star whereas another showed a higher lm MN on a hierarchical tree,
>> then this would be evidence that their correlations structures are
>> different.  This could be done with our PHYSIG_LL.m Matlab program
>> (from Blomberg et al., 2003) or with Regressionv2.m (just pretend you
>> are doing a regression with the trait of interest as the dependent
>> variable and tell it you have no independent variables).  I know that
>> does not give a statistical test, but it would be an indicator.  I am
>> sure you could figure out a way to do a formal test via randomization
>> or simulation procedures.  But, maybe there is also a good way through
>> what Simon mentioned:
>>
>>
>>> We have developed Bayesian methods to fit ME models for
>>> phylogenetic data in OpenBUGS and JAGS (submitted to Evolution).
>>>
>> Simon, I'd like to see the paper if you are willing to share at this
>> stage.
>>
>> Cheers,
>> Ted
>>
>>    ---- Original message ----
>>
>>      Date: Mon, 27 Sep 2010 17:39:08 +0200
>>      Subject: Re: [R-sig-phylo] PIC vs. PGLS
>>      To: Simon Blomberg<s.blomberg1 at uq.edu.au>
>>      Cc: r-sig-phylo at r-project.org
>>
>>      >Hi all,
>>      >
>>      >Simon Blomberg wrote on 27/09/2010 08:10:
>>      >>  Hi Anne, Ted, Liam et al!,
>>      >>
>>      >>
>>      >>  On 25/09/10 02:51, tgarland at ucr.edu wrote:
>>      >>>  Hi Anne,
>>      >>>
>>      >>>  I am going to put this back online so others might
>>      benefit or chime in.
>>      >>>  If you literally want to do correlations, then
>>      phylogenetically
>>      >>>  independent contrasts may be easiest. You just compute
>>      the
>>      >>>  standardized contrasts for each of your traits, then
>>      compute
>>      >>>  correlations (through the origin).
>>      >>>
>>      >>
>>      >>  The correct statistical term is "noncentral
>>      correlation." Correlation
>>      >>  "through the origin" makes no sense, as there is no
>>      line involved here.
>>      >>  Nothing is going through the origin. (Although I guess
>>      we all know what
>>      >>  we mean.)
>>      >>>  You can also handle interactions with contrasts. You
>>      just need to
>>      >>>  compute the usual 0-1 dummy variables, compute
>>      contrasts of them,
>>      >>>  etc. This was discussed here:
>>      >>>
>>      >>>  Garland, T., Jr., P. H. Harvey, and A. R. Ives. 1992.
>>      Procedures for
>>      >>>  the analysis of comparative data using
>>      phylogenetically independent
>>      >>>  contrasts. Systematic Biology 41:18-32.
>>      >>>  Clobert, J., T. Garland, Jr., and R. Barbault. 1998.
>>      The evolution of
>>      >>>  demographic tactics in lizards: a test of some
>>      hypotheses concerning
>>      >>>  life history evolution. Journal of Evolutionary
>>      Biology 11:329-364.
>>      >>>
>>      >>>  However, then you are most likely going to want to be
>>      back into
>>      >>>  regression models. At this point, I refer you back to
>>      what Liam
>>      >>>  Revell wrote.
>>      >>>
>>      >>>  Another interesting technical point. In general, PIC
>>      and PGLS are the
>>      >>>  same, especially if you stick with a simple Brownian
>>      motion model of
>>      >>>  character evolution. However, their complete
>>      mathematical identity
>>      >>>  has not, to my knowledge, been proven.
>>      >>  I have a proof. I have a paper submitted to Sys. Biol.
>>      on the topic.
>>      >
>>      >You can find in my book the R code to get exactly the
>>      same coefficients
>>      >with PICs and PGLS. This works as long as the tree is
>>      ultrametric (for
>>      >equal variance). It's not a formal proof, of course, but
>>      a strong suspicion.
>>      >
>>      >>>  And, some things that can easily be done with PUC
>>      cannot easily be
>>      >>>  done with PGLS, partly because the proofs don't yet
>>      exist but also
>>      >>>  because the code to do it does not exist (so far as I
>>      am aware). For
>>      >>>  example, with PIC you can easily compute contrasts on
>>      a hierarchical
>>      >>>  tree for one trait, but on star with another. Or, you
>>      can use
>>      >>>  different sets of branch lengths for different traits
>>      >>>  biological or statistical sense to do. We have used
>>      this ability
>>      >>>  sometimes when we have "nuisance variables" that need
>>      to be
>>      >>>  incorporated in our set of independent variables in a
>>      multiple
>>      >>>  regression model. Given that the nuisance variable is
>>      not
>>      >>>  phylogenetically inherited, it makes sense to compute
>>      contrasts on a
>>      >>>  star. We have done t!
>>      >>>
>>      >>
>>      >>   From a biological point of view, I understand why you
>>      would want to do
>>      >>  that. But be aware that this means the resulting
>>      estimator will not have
>>      >>  minimum variance. Since the minimum variance property
>>      is a very useful
>>      >>  property of PIC/GLS, I don't recommend Ted's approach.
>>      Keep the same
>>      >>  tree for response AND explanatory variables. That way
>>      you are using the
>>      >>  Minimum Variance Unbiased Estimator (MVUE). If you
>>      really want to use
>>      >>  different trees/branch lengths for explanatory and
>>      response variables,
>>      >>  then you should ditch GLS/PIC and move to a Measurement
>>      Error (Errors in
>>      >>  Variables) model that explicitly incorporates
>>      covariance in explanatory
>>      >>  variables. We have developed Bayesian methods to fit ME
>>      models for
>>      >>  phylogenetic data in OpenBUGS and JAGS (submitted to
>>      Evolution). A good
>>      >>  introduction to ME models is Carrol, Ruppert,
>>      Stafanski, Crainiceanu
>>      >>  (2006). Measurement Error in Nonlinear Models. A Modern
>>      Perspective.
>>      >>  Chapman&  Hall.
>>      >
>>      >I share some of Simon's concern but for a different
>>      reason. I see no
>>      >problem to build complex models (and what Ted describes
>>      seems quite
>>      >exciting) as long as you do that in a hierarchical and
>>      consistent way
>>      complexity is
>>      >statistically significant. For instance, it is not clear
>>      to me how to
>>      >test that different predictors have distinct correlation
>>      structures.
>>      >
>>      >Emmanuel
>>      >
>>      >>>  hi!
>>      >>>  !
>>      >>>  s sort of thing here:
>>      >>>
>>      >>>  Wolf, C. M., T. Garland, Jr., and B. Griffith. 1998.
>>      Predictors of
>>      >>>  avian and mammalian translocation success: reanalysis
>>      with
>>      >>>  phylogenetically independent contrasts. Biological
>>      Conservation
>>      >>>  86:243-255.
>>      >>>  Perry, G., and T. Garland, Jr. 2002. Lizard home
>>      ranges revisited:
>>      >>>  effects of sex, body size, diet, habitat, and
>>      phylogeny. Ecology
>>      >>>  83:1870-1885.
>>      >>>
>>      >>>  All of this then relates back to what Liam wrote.
>>      Anyway, none of
>>      >>>  these questions are entirely simple, nor the answers
>>      entirely
>>      >>>  straightforward.
>>      >>>
>>      >>  If the answers were straightforward, it wouldn't be so
>>      much fun!
>>      >>
>>      >>  Cheers,
>>      >>
>>      >>  Simon.
>>      >>
>>      >>>  Cheers,
>>      >>>  Ted
>>      >>>
>>      >>>
>>      >>>  ---- Original message ----
>>      >>>
>>      >>>  Date: Fri, 24 Sep 2010 17:56:46 +0200
>>      >>>  From: Anne Kempel<kempel at ips.unibe.ch>
>>      >>>  Subject: Re: [R-sig-phylo] (no subject)
>>      >>>  To: tgarland at ucr.edu
>>      >>>
>>      >>>  >Hi Ted,
>>      >>>  >I guess I do regression models, since this GLS method
>>      >>>  only does
>>      >>>  >regressions (I have an interaction term, thats why I
>>      use
>>      >>>  the
>>      >>>  >phylogenetic GLS and not the PIC method). But
>>      >>>  biologically, for those
>>      >>>  >species traits I am analysing each one could be
>>      dependent
>>      >>>  - e.g. the
>>      >>>  >plants growth rate might depend on plant resistance,
>>      but
>>      >>>  plant
>>      >>>  >resistance also depends on the growth rate.
>>      >>>  >All the best,
>>      >>>  >anne
>>      >>>  >
>>      >>>  >
>>      >>>  >On 24.09.2010 17:33, tgarland at ucr.edu wrote:
>>      >>>  >>  Are you doing regression models or correlations per
>>      se?
>>      >>>  >>
>>      >>>  >>  Cheers,
>>      >>>  >>  Ted
>>      >>>  >>
>>      >>>
>>      >>>  Date: Fri 24 Sep 08:43:52 PDT 2010
>>      Book | This is
>>      >>>  Spam (on behalf of "Liam J.
>>      Revell"<lrevell at nescent.org>)
>>      >>>  Subject: Re: [R-sig-phylo] (no subject)
>>      >>>  To: Anne Kempel<kempel at ips.unibe.ch>
>>      >>>  Cc: r-sig-phylo at r-project.org
>>      >>>
>>      >>>  Hi Anne,
>>      >>>
>>      hypothesis - not on
>>      >>>  which trait shows phylogenetic signal. However, GLS
>>      "corrects for"
>>      >>>  non-independence in the residual error of y given X -
>>      non-independence
>>      >>>  which may be due to (for instance) phylogenetic
>>      history. Incidentally,
>>      >>>  if our observations for X are non-independent due to
>>      the phylogeny, but
>>      >>>  the residual error in y given X is uncorrelated, than
>>      GLS is not
>>      >>>  necessary (and will actually give us an estimate with
>>      inflated variance).
>>      >>>
>>      >>>  I have a paper about exactly this topic that was
>>      recently published
>>      >>>  online in the new journal "Methods in Ecology and
>>      Evolution." The
>>      >>>  citation is:
>>      >>>  Revell, L. J. 2010. Phylogenetic signal and linear
>>      regression on species
>>      >>>  data. /Methods in Ecology and Evolution/ Online Early
>>      View.
>>      >>>  And the article can be found at the following URL:
>>      >>>  http://dx.doi.org/10.1111/j.2041-210X.2010.00044.x or
>>      on my website (URL
>>      >>>  below).
>>      >>>
>>      >>>  I hope this is of some help.
>>      >>>
>>      >>>  - Liam
>>      >>>
>>      >>>  Liam J. Revell
>>      >>>  NESCent, Duke University
>>      >>>  web: http://anolis.oeb.harvard.edu/~liam/
>>      >>>  NEW email: lrevell at nescent.org
>>      >>>
>>      >>>  _______________________________________________
>>      >>>  R-sig-phylo mailing list
>>      >>>  R-sig-phylo at r-project.org
>>      >>>  https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>      >>>
>>      >>
>>      >
>>      >--
>>      >IRD, Montpellier, France
>>      >  ph: +33 (0)4 67 16 64 47
>>      >  fax: +33 (0)4 67 16 64 40
>>      >http://ape.mpl.ird.fr/
>>      >
>>      >_______________________________________________
>>      >R-sig-phylo mailing list
>>      >R-sig-phylo at r-project.org
>>      >https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>
>>
>
>

--