[R-sig-phylo] function to insert missing tips in a clade

François Michonneau francois.michonneau at gmail.com
Thu Jan 19 16:45:28 CET 2012

Hi Robin,

  Please find below a function I wrote that does what you want. Let me
know if you have any questions, if you find any bugs or if you have
any suggestions to make it better.

   -- François

##' Graft unsampled taxa onto a tree by simulating random trees.
##' Based on a data frame containing the number of species at the tip
of a phylogeny,
##' this function will generate random phylogenies for each group of
missing species
##' and graft them onto the tree. They are not grafted directly at the
tip of the existing
##' tree. Instead a random fraction of the existing branch length is
removed and the
##' simulated tree, representing the missing taxa, is scaled to have
the same depth (length)
##' as the fraction that is removed. In the end, an ultrametric tree
with the same depth (length)
##' as the original object is generated.
##' @title Graft unsample taxa onto a tree.
##' @param tr a phylogenetic tree stored as a \code{phylo} object.
##' @param listSp a single-column data frame which contains the number
of species unsampled
##' at each tip. The labels of the tree \code{tr} and the row names of
\code{listSp} need to
##' match.
##' @return a phylogenetic tree stored as a \code{phylo} object. The
tips for which species have
##' been added too are appended a number (starting at 1) for each clade.
##' @author François Michonneau
##' @examples
##' tr <- rcoal(10)
##' listSp <- data.frame(sp=sample(1:10), row.names=tr$tip.label)
##' trWithGraft <- graftMissingTaxa(tr, listSp)
##' par(mfrow=c(1,2))
##' plot(tr)
##' plot(trWithGraft)
graftMissingTaxa <- function(tr, listSp) {

  if (class(tr) != "phylo") stop("\'tr\' must be an object of class \'phylo\'")
  if (!is.ultrametric(tr))
    warning("This function is intended to be used on ultrametric trees.")

  spToAdd <- subset(listSp, listSp[,1] > 1)

  for (i in 1:nrow(spToAdd)) {
    trP4 <- as(tr, "phylo4")
    tipToAdd <- spToAdd[i,1]
    tipName <- rownames(spToAdd)[i]
    rTr <- as(rcoal(tipToAdd), "phylo4")   # draw random tree (coal to
be ultrametric)
    tipLabels(rTr) <- paste(tipName, 1:tipToAdd, sep="_")
    distToTip <- edgeLength(trP4, getNode(trP4, tipName))
    newDist <- runif(1) * distToTip
    totalDist <- distToTip - newDist
    edgeLength(rTr) <- edgeLength(rTr) * totalDist /
sumEdgeLength(rTr, ancestors(rTr, 1, "ALL"))
    edgeLength(trP4)[getEdge(trP4, tipName)] <- unname(newDist)
    tr <- bind.tree(as(trP4, "phylo"), as(rTr, "phylo"),
match(tipName, tr$tip.label))

On Wed, Jan 18, 2012 at 11:41, Velzen, Robin van <Robin.vanVelzen at wur.nl> wrote:
> Dear all,
> I wish to do various lineage diversification analyses which assume complete sampling. My sampling is not complete but I do know the likely clade membership of the missing tips. As far as I know two different approaches to inserting these tips have been proposed: I. A skeletal tree approach collapsing the under sampled clades and randomly resolving them with the complete number of tips (FitzJohn, Maddison & Otto (2009) Syst Biol 58: 595-611). II. An expanded tree approach where missing tips are randomly inserted with equal probability along the branches belonging to its likely clade (e.g. Day, Cotton & Barraclough (2008) PLoS ONE 3(3): e1730). The first approach is straightforward to implement (it is implemented in the diversitree package).  I find the second more challenging, however, and hope you can help me improve the solution I have found so far.
> Basically what I want to do is randomly insert a number of missing taxa to a specified clade in a tree. Below you can find a "work in progress" simple function that relies on the ape and phylobase packages:
> library(ape)
> library(phylobase)
> tree<-compute.brlen(read.tree(text="(t1,((t2,t3),((t6,(t4,t5)),((t7,t8),(t9,t10)))));"))
> clade<-c("t4", "t5","t6")
> insert.tips<-function(tree,clade,n){
>   T<-tree
>   for (i in 1:n){
>     T4<-phylo4(T)
>     nodes<-descendants(T4,MRCA(T4,clade), type="all")
>     insertnode<-sample(nodes, 1, prob=T$edge.length[match(nodes,T$edge[,2])])
>     insertnodelength<-T$edge.length[match(insertnode,T$edge[,2]) ]
>     insertposition<-runif(1, min=0, max=insertnodelength)
>     new.tip<-compute.brlen(stree(1, tip.label = "z"),
> min(dist.nodes(T)[insertnode,1:length(T$tip.label)])+insertposition)
>     T<-(bind.tree(T,new.tip,where=insertnode, position=insertposition ) )
>     T$tip.label<-make.unique(T$tip.label, sep="")
>   }
> return(T)
> }
> plot(insert.tips(tree,clade, 5))
> Any comments or suggestions to improve the code would be very welcome. I am a novice and have been struggling especially with finding ways to keep the tree ultrametric. As you can see insert nodes are currently randomly selected in proportion to their branch length which seems logical to me but I am open to suggestions. Also, if the code can be improved in terms of efficiency that would be of great help because I will need to process a large number of trees.
> At a higher level, I am wondering if this approach would bias any Yule or birth-death models fitted over the tree?
> Many thanks!
> Robin
> PhD candidate
> Biosystematics Group
> Wageningen University
>        [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

More information about the R-sig-phylo mailing list