[R-sig-phylo] ancestral reconstruction of geographic origin (function ace{ape})

Emmanuel Paradis Emmanuel.Paradis at mpl.ird.fr
Wed Feb 10 18:30:49 CET 2010

Hi Andrew,

Andrew Rominger wrote on 08/02/2010 22:08:
> Hello,
> I'm trying to reconstruct the major biogeographic region from which
> different families have originated.  I have data on the proportion of each
> family's range that falls within each biogeographic region, as well as (of
> course) a phylogeny.
> My approach so far has been to assign, if possible, each family to ONE
> biogeographic region.  I do so by finding which region contains >= 50% of
> the family's range.  The problem (potentially among many!) is that some
> families occur in multiple regions and no single region contains >= 50% of
> their range.  So I have assigned these families to a new category
> "cosmopolitan."  I then have a discrete variable of "biogeographic affinity"
> and reconstruct this along the phylogeny using the ace function in package
> ape.

Your problem may be attacked in different ways. Here are two suggestions.

1. Instead of considering range as a single multi-state variable, 
consider presence/absence within each region, resulting in as many 
binary variables as there are regions. The advantage is that it might be 
easier to fit these 2-state models than a single many-state model (so 
the parameters may be more 'efficiently' estimated). Even you could 
interpret the parameters as local rates of extinction.

2. Consider your original multi-state variable but build your model with 
0's in the transition matrix to disallow some transitions. It seems 
realistic to assume that a taxon can expand or contract its range, but 
not 'move' it from one region to another. For instance with 2 regions:

 > Q <- matrix(0, 3, 3)
 > rownames(Q) <- colnames(Q) <- c("Asia", "Eurasia", "Europe")
 > Q[c(2, 4, 6, 8)] <- 1
 > Q
         Asia Eurasia Europe
Asia       0       1      0
Eurasia    1       0      1
Europe     0       1      0

With this model all allowed transitions have the same rate (so only one 
parameter to estimate; the diagonal is ignored), but you can change this 
by modifying the appropriate indices in Q (this is explained in ?ace). 
With this approach you avoid the ancestors to have disjoint ranges (eg, 
present in New Zealand and in France) which could happen with the first 
one. However, it might be very cumbersome to define the states if there 
are a lot of regions. An alternative could be, if all regions are 
'equal', to define the states as "present in 1, 2, 3, ... regions" and 
allowing only transitions 1 <-> 2, 2 <-> 3, ...

In all cases you can fit a set of nested models and compare them with 

My point is just that the possibility to fix some rates to zero in the 
transition matrix gives a lot of possibilities (just use your imagination).

> I assume that each ancestor is from a single region and so I take the scaled
> likelihood for from the cosmopolitan state and redistribute it evenly to all
> other states.

Do you want to assume it or test it? Since some taxa are already 
distributed in several regions, it seems clear that your assumption will 
be rejected. If you want to assume such a "single-region" state you have 
to use a different method that what is currently in ace().

> My rational is that from a coalescent perspective the
> ancestor is a single individual so being "cosmopolitan" is not possible.

I think you are confused with the coalescent: the "single individual" is 
an MRCA, not an image what was the ancestral population.

> That being said, I have no idea if this practice of "redistributing" scaled
> likelihoods is legitimate, and my intuition is that it is not.  So...does
> anyone know of a better way?  I am not wedded to using a discrete state
> reconstruction, and if it's possible to explicitly reconstruct the
> proportion of each range in each region that would probably be better.  My
> concern there would be that the reconstructed "proportions" would not sum to
> 1 at every node, and then what--rescale them?  That seems no better than
> redistributing likelihoods in the discrete case.  But I don't know!

If I were you, I'd try one of the above rather than "redistributing 
scaled likelihoods".



> Thanks in advance for any help--
> Andy
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Emmanuel Paradis
IRD, Montpellier, France
   ph: +33 (0)4 67 16 64 47
  fax: +33 (0)4 67 16 64 40

More information about the R-sig-phylo mailing list