[R-sig-phylo] diversification analysis questions
drabosky at berkeley.edu
Wed Nov 2 15:02:21 CET 2011
In order to test for variation in diversification rates through time, you have to make some strong assumptions about the nature of your taxon sampling. This is because sampling only a fraction of the species within a clade will "look like" decreasing speciation through time, even if the true underlying process involves constant diversification rates. The problem is exacerbated if taxon sampling is not random with respect to phylogeny. If you have sampled representatives of families or genera, for example, you will be biased towards oversampling early divergences in the tree (because a genus-level tree presumably contains a "phylogenetically overdispersed" subset of the total set of species).
To meet the assumptions of the analyses, you'll have to have one of the following:
(1) complete or mostly complete taxon sampling. Missing 5% of the species from a clade at random will have very little effect on your analyses even if you ignore the missing taxa, but missing 95% will have an enormous effect.
(2) Incomplete taxon sampling, but taxon sampling is random. However, even if taxon sampling is random, I still have little confidence in "through time" analyses of diversification if fewer than 40-50% of species have been sampled.
(3) You could have incomplete and non-random taxon sampling, but account for the nature of the non-randomness in the model itself (or in the generation of the null distribution). This option often isn't practical and hasn't been explored much in the literature.
With highly incomplete taxon sampling, and presumably non-random taxon sampling as well, I think it will be difficult to say much about rates of diversification through time. It doesn't matter whether you look at LTT plots, use the gamma statistic, or fit explicit speciation-extinction models to the data - the taxon sampling issues will affect all approaches in similar fashion.
On Nov 1, 2011, at 3:50 PM, Patricia Cabezas wrote:
> Dear all,
> I am working with a crustacean dataset. The infraorden includes 2500
> species and we have sampled 144 species including all extant families for a
> total of 5 genes. Thus, our taxon sampling is far from be exhaustive and I
> don‚t know if related, but I am having troubles to run the analysis in
> LASER to estimate the critical value for the MCCRtest (NumberMissing=2307)
> and to simulate Yule trees under pure birth model in TreeSim (frac=0.058).
> With TreeSim if I increase de value of frac, for example up to 0.8, I can
> run the analysis.
> So far, I have explored the estimated chronogram from BEAST (without
> outgroups) under different diversification analyses approaches. First,
> whole tree statistics from SYMMETREE suggest a shift in diversification
> rate but the delta statistics does not find statistical support for a
> single shift point of hypothesis of constant diversification. I have also
> performed an LTT plot and I find a pattern of accelerated diversification,
> with a clear shift around 40Ma.
> Next I performed LASER comparing pure-birth vs birth-death models and the
> likelihood value is exactly the same for both models and the extinction
> rate estimated for the birth-death one is zero. Next, I ran the analysis to
> a set of rate-constant and rate-variable models, and the model with the
> lowest AIC values is the Yule3 but I cannot find out if this is significant
> because I cannot generate the null distribution of dAICrc test statistic (the
> problem I mentioned above to simulate phylogenies under Yule process in
> TreeSim). Moreover, the parameters values for the Yule3 model are:
> Is normal the value in r2?. It seems out of range. And how is possible that
> the both shifts are detected at the same age?.
> Thanks in advance,
> [[alternative HTML version deleted]]
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
More information about the R-sig-phylo