CRAN Task View: Phylogenetics, Especially Comparative Methods

Maintainer:Brian O'Meara
Contact:omeara.brian at

The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on github, R-forge, or authors' websites.

Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape), though the S4 phylo4 class (implemented in phylobase) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo));"). treebase can search for and load trees from the online tree repository TreeBASE. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website ).

Utility functions: These packages include functions for manipulating trees or associated data. ape has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, and many more. phylobase has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants). geiger can prune trees and data to an overlapping set of taxa. evobiR can do fuzzy matching of names (to allow some differences).

Ancestral state reconstruction : Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using ape. phytools can do stochastic character mapping of traits on trees.

Diversification Analysis: Lineage through time plots can be done in ape. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification). TESS can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in geiger. diversitree implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009). TreePar estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species. caper can do the macrocaic test to evaluate the effect of a a trait on diversity. apTreeshape also has tests for differential diversification (see description ). iteRates can identify and visualize areas on a tree undergoing differential diversification. DDD can fit density dependent models as well as models with occasional escape from density-dependence.

Divergence Times: Non-parametric rate smoothing (NPRS) and penalized likelihood can be implemented in ape.

Phylogenetic Inference: UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package ape. phangorn can estimate trees using distance, parsimony, and likelihood. phyclust can cluster sequences. phytools can build trees using MRP supertree estimation and least squares. scaleboot can perform the Shimodaira-Hasegawa test for comparing trees. phylotools can build supermatrices for analyses in other software. For more information on importing sequence data, see the Genetics task view.

Time series: Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using paleoTS.

Tree Simulations: Trees can be simulated using constant-rate birth-death with various constraints in TreeSim and a birth-death process in geiger. Random trees can be generated in ape by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees). paleotree can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa. TESS can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.

Trait evolution: Independent contrasts for continuous characters can be calculated using ape, picante, or caper (which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in geiger. corHMM can look for hidden rates in discrete traits as well as fit correlational models for two or three binary traits (similar to Pagel's old Discrete program) and complex models for multistate traits (similar to Pagel's old Multistate program). Brownian motion models can be fit in geiger, ape, and paleotree. Multiple-rate Brownian motion can be fit in motmot and RBrownie (both currently not on CRAN, but older versions can be downloaded obtained from the archive ). Deviations from Brownian motion can be investigated in geiger, OUwie, and PVR. Ornstein-Uhlenbeck (OU) models can be fitted in geiger, ape, ouch (with multiple means), and OUwie (with multiple means, rates, and attraction values). maticce uses ouch to search for where a regime transition occurs (it was recently removed from CRAN and will not install from R-forge). geiger fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in geiger. Traditional GLS methods (sensu Grafen or Martins) can be implemented in ape or caper. Phylogenetic autoregression (sensu Cheverud et al) and Phylogenetic autocorrelation (Moran's I) can be implemented in ape or--if you wish the significance test of Moran's I to be calculated via a randomization procedure--in adephylo. Correlation between traits using a GLMM can also be investigated using MCMCglmm. phylolm can fit phylogenetic linear regression and phylogenetic logistic regresssion models using a fast algorithm, making it suitable for large trees. phytools can also investigate rates of trait evolution and do stochastic character mapping. metafor can perform meta-analyses accounting for phylogenetic structure. geomorph can do geometric morphometric analysis in a phylogenetic context.

Trait Simulations : Continuous traits can be simulated using brownian motion in ouch, geiger, ape, picante, OUwie, and caper, the Hansen model (a form of the OU) in ouch and OUwie and a speciational model in geiger. Discrete traits can be simulated using a continuous time Markov model in geiger. phangorn can simulate DNA or amino acids, and phylosim can do all these with also insertions and deletions. Both discrete and continuous traits can be simulated under models where rates change through time in geiger. phytools can simulate discrete characters using stochastic character mapping. phylolm can simulate continuous or binary traits along a tree.

Tree Manipulation : Branch length scaling using ACDC; Pagel's (1999) lambda, delta and kappa parameters; and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in geiger. phytools also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using ape. Extinct taxa can be pruned using geiger. phylobase offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with phangorn. paleotree has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations.

Community/Microbial Ecology : picante, vegan, SYNCSA, phylotools and caper integrate several tools for using phylogenetics with community ecology. HMPTrees and GUniFrac provide tools for comparing microbial communities.

Phyloclimatic Modeling : phyloclim integrates several new tools in this area.

Species/Population Delimitation : spider can use DNA barcoding data to investigate species delimitation and related studies.

Tree Plotting and Visualization: User trees can be plotted using ape, adephylo, phylobase, phytools, and ouch. paleoPhylo and paleotree are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using ape. Ancestral state reconstructions can be visualized along branches using ape and paleotree. phytools can project a tree into a morphospace.

Tree Comparison: Tree-tree distances can be evaluated, and used in additional analyses, in distory. ape can compute tree-tree distances and also create a plot showing two trees with links between associated tips.

Miscellaneous: treebase offers ways to download trees from TreeBase, an online repository of phylogenies and phylogenetic data. rmesquite offers a way to call headless Mesquite from R, useful for many kinds of analyses. To do the reverse, use R.Mesquite .


CRAN packages:

Related links: