[BioC] boot.phylo( ) function
Emmanuel Paradis
paradis at isem.univ-montp2.fr
Thu Mar 22 09:15:55 CET 2007
Dear all,
I have replied privately to this query, mentioning that the BioC list
may not be appropriate for this kind of question, but I was apparently
wrong (I'm not on the list, so not aware of the usual topics).
Martin Morgan wrote:
> Hi Nora --
>
> I do not have direct experience with this package, but from the help
> page
>
>> library(ape)
>> ?boot.phylo
>
> perhaps
>
> allbacteria <- read.dna("allbacteriafasta","fasta")
> distallbacteria <-
> dist.dna(allbacteria, pairwise.deletion=TRUE, as.matrix=TRUE)
> njtree <- nj(distallbacteria)
>
> boot.phylo(njtree,
> allbacteria,
> FUN=function(bootstrappedData) {
> nj(dist.dna(bootstrappedData,
> pairwise.deletion=TRUE,
> as.matrix=TRUE))
> })
>
> works?
Yes, this is it. The option as.matrix=TRUE is not needed, and will
likely slow down the whole thing.
Earl Glynn asked me for an example of using boot.phylo with a "toy"
problem. Here's what can be done in R:
library(ape)
data(woodmouse)
d <- dist.dna(woodmouse)
tr <- nj(d)
bp <- boot.phylo(tr, woodmouse, function(xx) nj(dist.dna(xx)))
plot(tr)
nodelabels(bp)
Then you can change the options of dist.dna and boot.phylo to see how
this may affect the results. There is a more "real life" example in my
book "APER" (this case study is actually spread in several chapters, so
that it explains the analysis from A to Z, or nearly).
boot.phylo is fairly fast for small to moderate size, on my laptop I have:
> system.time(bp <- boot.phylo(tr, woodmouse, function(xx)
nj(dist.dna(xx))))
[1] 11.089 0.008 11.098 0.000 0.000
These times are expected to be quite longer for big data sets as a
substantial part of the computation is done in R (I plan to rewrite this
in C sometime). Also I have experimental codes for dist.dna that are
much faster than the current ones.
> Martin
>
> Nora Muda <noramuda at yahoo.com> writes:
>
>> Dear BioConducter useRs,
>>
>> I have problems in writing boot.phylo() function.
>>
>> Let say I have 30 aligned sequences; then I computed
>> pairwise distances with "K80" method. Then I construct
>> phylogenetic tree with neighbor-joining method and my
>> proposed method. Now I have problems in writing "FUN"
>> in boot.phylo() function. Below are examples of my
>> programs:
>>
>> library(ape)
>> allbacteria <- read.dna("allbacteriafasta","fasta")
>> distallbacteria <-
>> dist.dna(allbacteria,pairwise.deletion=TRUE,as.matrix=TRUE)
>> plot(nj(distallbacteria))
>>
>> boot.phylo(plot(nj(distallbacteria)),allbacteria,nj(distallbacteria))
>>
>> What should I put as FUN in boot.phylo?
>>
>> I make comparison between distances of "K80" in PHYLIP
>> and dist.dna("K80").There are a lot of differences esp
>> in PHYLIP there is a default for
>> transversion/transition rate; which is 2 but not in
>> ape package. How to modify it to make it the same?
Here the answer I sent to Nora:
Indeed, and these differences are expected. PHYLIP assumes a
"theoretically expected" value of the ts/tv ratio, the distance is then
calculated by maximizing a likelihood function. (I owe this information
to my colleague Olivier Gascuel who dissected PHYLIP's code.) You can
change the default of the ts/tv ratio, in which case it is also
estimated by ML. dist.dna is faithful to Kimura's original formulae; it
also computes the variance of the distances (which PHYLIP does not).
Emmanuel Paradis
More information about the Bioconductor
mailing list