[R-sig-phylo] ancestral reconstruction of geographic origin (function ace{ape})
Emmanuel Paradis
Emmanuel.Paradis at mpl.ird.fr
Wed Feb 10 18:30:49 CET 2010
Hi Andrew,
Andrew Rominger wrote on 08/02/2010 22:08:
> Hello,
>
> I'm trying to reconstruct the major biogeographic region from which
> different families have originated. I have data on the proportion of each
> family's range that falls within each biogeographic region, as well as (of
> course) a phylogeny.
>
> My approach so far has been to assign, if possible, each family to ONE
> biogeographic region. I do so by finding which region contains >= 50% of
> the family's range. The problem (potentially among many!) is that some
> families occur in multiple regions and no single region contains >= 50% of
> their range. So I have assigned these families to a new category
> "cosmopolitan." I then have a discrete variable of "biogeographic affinity"
> and reconstruct this along the phylogeny using the ace function in package
> ape.
Your problem may be attacked in different ways. Here are two suggestions.
1. Instead of considering range as a single multi-state variable,
consider presence/absence within each region, resulting in as many
binary variables as there are regions. The advantage is that it might be
easier to fit these 2-state models than a single many-state model (so
the parameters may be more 'efficiently' estimated). Even you could
interpret the parameters as local rates of extinction.
2. Consider your original multi-state variable but build your model with
0's in the transition matrix to disallow some transitions. It seems
realistic to assume that a taxon can expand or contract its range, but
not 'move' it from one region to another. For instance with 2 regions:
> Q <- matrix(0, 3, 3)
> rownames(Q) <- colnames(Q) <- c("Asia", "Eurasia", "Europe")
> Q[c(2, 4, 6, 8)] <- 1
> Q
Asia Eurasia Europe
Asia 0 1 0
Eurasia 1 0 1
Europe 0 1 0
With this model all allowed transitions have the same rate (so only one
parameter to estimate; the diagonal is ignored), but you can change this
by modifying the appropriate indices in Q (this is explained in ?ace).
With this approach you avoid the ancestors to have disjoint ranges (eg,
present in New Zealand and in France) which could happen with the first
one. However, it might be very cumbersome to define the states if there
are a lot of regions. An alternative could be, if all regions are
'equal', to define the states as "present in 1, 2, 3, ... regions" and
allowing only transitions 1 <-> 2, 2 <-> 3, ...
In all cases you can fit a set of nested models and compare them with
anova().
My point is just that the possibility to fix some rates to zero in the
transition matrix gives a lot of possibilities (just use your imagination).
> I assume that each ancestor is from a single region and so I take the scaled
> likelihood for from the cosmopolitan state and redistribute it evenly to all
> other states.
Do you want to assume it or test it? Since some taxa are already
distributed in several regions, it seems clear that your assumption will
be rejected. If you want to assume such a "single-region" state you have
to use a different method that what is currently in ace().
> My rational is that from a coalescent perspective the
> ancestor is a single individual so being "cosmopolitan" is not possible.
I think you are confused with the coalescent: the "single individual" is
an MRCA, not an image what was the ancestral population.
> That being said, I have no idea if this practice of "redistributing" scaled
> likelihoods is legitimate, and my intuition is that it is not. So...does
> anyone know of a better way? I am not wedded to using a discrete state
> reconstruction, and if it's possible to explicitly reconstruct the
> proportion of each range in each region that would probably be better. My
> concern there would be that the reconstructed "proportions" would not sum to
> 1 at every node, and then what--rescale them? That seems no better than
> redistributing likelihoods in the discrete case. But I don't know!
If I were you, I'd try one of the above rather than "redistributing
scaled likelihoods".
HTH
Emmanuel
> Thanks in advance for any help--
> Andy
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>
--
Emmanuel Paradis
IRD, Montpellier, France
ph: +33 (0)4 67 16 64 47
fax: +33 (0)4 67 16 64 40
http://ape.mpl.ird.fr/
More information about the R-sig-phylo
mailing list