[R-sig-phylo] R: Re: R: ancestral state reconstruction for tips

Morgan Langille morgan.g.i.langille at gmail.com
Fri Aug 5 20:40:39 CEST 2011

Hello everyone. I just wanted to say thank you for all of the
responses so far and have thoroughly enjoyed the discussion. Just for
reference I thought I would explain what I am doing in more detail and
why I asked my original question.

I am interested in developing a practical method for using 16s
sequences to infer function. This is to hopefully aid in metagenomic
experiments where we would like to compare the "observed" functions
(quantity of a particular protein family) in a metagenomic sample to
what we would "expect" based on the species that we believe are
present in the sample based on 16s sequences (or possibly some other
marker down the road). The current pipeline starts with a 16s
reference tree for all Archaea and Bacteria completed genomes (~1400
species). We know the functions within these genomes so I would like
to leverage that information along with the tree to predict (as best
as possible) what the functions would be for a newly placed species on
that tree.

One method would be simply to take the "nearest neighbour" (the
species with minimum 16s branch length) to the new species and use the
functions encoded in that genome as a representative. However, this is
very naive. I then turned to ancestral state reconstruction and
current methods (if any) for predicting characters for species that we
don't have information on for those traits.

I realize that caution has to be used for predicting these functions,
as mentioned below by Pasquale, but I am mostly searching for some
"best practices" to use in my current situation.
The results should be interesting since I will be testing how well the
method does across ~10000 functions (e.g. PFAMs). Many of these are
not predictable at all since their phylogenetic signal is basically
nill due to horizontal gene transfer. However, I am optimistic that
many functions will be reliably predictable.


Morgan Langille

On Fri, Aug 5, 2011 at 2:51 PM, Joe Felsenstein <joe at gs.washington.edu> wrote:
> Pasquale Raia said:
>> Of course Ted is right, but my problem with this computation, or
>> with the
>> simple exercise I was proposing is well another: as a
>> paleontologist I often
>> come across pretty exceptional phenotypes (dwarf hippos and
>> elephants, huge
>> flightless birds, to make a few examples). When you use methods
>> like this (I
>> mean Garland and Ives') and compare the output with those
>> phenotypes, as I did,
>> you immediately realize what the the bottom line is: no matter if
>> they are
>> nodes or tips, by using the expected (under BM) covariance the
>> estimated
>> phenotypes are dull, perfectly reasonable but very different from
>> anything
>> exceptional you may find yourself to work with. This is why I feel
>> it is
>> difficult to rely on those (unobserved) values to begin with.
> I think that what is being said is that Brownian Motion is too sedate
> a process
> and does not predict some of the large changes actually seen in the
> fossil
> record.
> That's a legitimate point but does put the onus on the maker of the
> point to
> propose some other stochastic process that is tractable and has these
> large
> changes (and that fits with known Mendelian and Darwinian mechanisms).
> Just complaining that the Brownian stochastic process is no good is
> insufficient.
> If we want to add the fossils to the calculation, then they will of
> course
> pressure the Brownian Motion process to change more in their vicinity,
> which may help some.
> Joe
> ----
> Joe Felsenstein      joe at gs.washington.edu
>  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,
> Box 5065, Seattle Wa 98195-5065
>        [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

More information about the R-sig-phylo mailing list