[R-sig-phylo] tree thievery

Grenyer, Richard grenyer at imperial.ac.uk
Fri Nov 7 15:52:48 CET 2008


For historical completeness, also:

http://evolve.zoo.ox.ac.uk/software/TreeThief/main.html

OS9, so nearly useless nowadays, but the author might provide algorithms
or advice. Apologies if this has been covered off-list.

Regards,

Rich

-----Original Message-----
From: r-sig-phylo-bounces at r-project.org
[mailto:r-sig-phylo-bounces at r-project.org] On Behalf Of Roderic Page
Sent: 07 November 2008 09:43
To: Brian O'Meara
Cc: r-sig-phylo at r-project.org; bbolker; Joseph Hughes
Subject: Re: [R-sig-phylo] tree thievery

Hi all (and thanks for Brian for forwarding this to me),

I have been looking at this problem, and we've made a little progress.  
Joseph Hughes (in my lab) wrote a tool to automate the whole process  
(from image to tree). It's a bit slow and there are issues. It has a  
web site
(http://page-70.zoology.gla.ac.uk/~jhughes/treebank/upimage.html 
   -- Joseph's moving labs today so it might not be online at present).

The rate-limiting step is recognising corners in an image, this seems  
to take a long time to do. I've pretty much abandoned plans to get  
this working fast enough for the December 1 deadline. It would be nice  
to have a fast, automatic way of doing this.

In the meantime, getting x,y coordinates manually might not be a  
shortcut. There's a nice Mac OS X program which does this
(http://www.arizona-software.ch/graphclick/ 
  ).

Regards

Rod


On 6 Nov 2008, at 23:20, Brian O'Meara wrote:

> This sounds great, Ben.  Have you talked to Rod Page about his  
> "Elsevier Grand Challenge" project (
<http://precedings.nature.com/documents/2217/version/1 
> > ) which involves parsing PDFs from Molecular Phylogenetics and  
> Evolution to extract trees and other data? It sounds like you two  
> might encounter similar issues.
>
> Brian
>
>
> On Nov 4, 2008, at 12:47 PM, bbolker wrote:
>
>> [background for r-sig-phylo: some of us have been talking about
>> the problems of grabbing trees from the literature when they
>> are not available in TreeBase or as Nexus or Newick format
>> from the authors.  Reconstructing Newick format from a big
>> tree is a huge pain, as anyone who has tried it will know, and
>> even then one wants the branch lengths as well as the topology]
>>
>>  The  problem of reconstructing trees from a set of (x,y) points
>> turns out not to be all that hard -- even "trivial" from the
>> computational point of view. The R function below takes
>> a set of (x,y) points, number of tips, and tip labels, and
>> returns a tree in "phylo" format [it assumes that all
>> the tips are first in the list of points, otherwise I think
>> order shouldn't matter].  I haven't tried it on
>> ultrametric trees, and I know that polytomies will
>> be trouble.
>>
>>  The examples below take the node (x,y) locations from
>> some of the ape examples (the tiny "owl tree" and the
>> bird.orders data set), which are retrievable using some
>> black magic, and reconstruct the trees.  **The trees do
>> not come back in the same order** (is this a problem?)
>> but they are equivalent.
>>
>>   Getting the (x,y) points into R in the first place is also
>> a potential challenge.  Two possible solutions: use g3data
>> (notes included below), a standalone, cross-platform
>> utility for retrieving point locations from image files.
>> One could also write a small R program that took
>> an image file, plotted it, and use locator() to get
>> the points (using pixmap:::read.pnm?).
>> I think I've written something like this
>> before, but would have to dig it up or redo it -- and
>> g3data has a nicer interface.
>>
>> ##
>> library(ape)
>>
>> ## from ?plot.tree:
>> cat("(((Strix_aluco:4.2,Asio_otus:4.2):3.1,",
>>    "Athene_noctua:7.3):6.3,Tyto_alba:13.5);",
>>    file = "ex.tre", sep = "\n")
>> tree.owls <- read.tree("ex.tre")
>> plot(tree.owls)
>> unlink("ex.tre") # delete the file "ex.tre"
>>
>> plot(tree.owls)
>> xy <- get("last_plot.phylo",envir=.PlotPhyloEnv)
>> xx <- xy$xx
>> yy <- xy$yy
>> points(xx,yy,col="white",pch=16,cex=2)
>> text(xx,yy,col=2,1:length(xx))
>>
>> ## assumes left-to-right horizontal tree -- may need some logic for
>> ##  different directions
>> ## assumes first N points are tips.
>> ##
>> ## polytomies?? may need to be explicitly identified ...
>> ## should?? work on non-ultrametric trees, but untested
>> build.tree <- function(xx,yy,tip.labels,ntips,
>>                       poly=numeric(0),
>>                       debug=FALSE) {
>>  if (!missing(tip.labels)) ntips <- length(tip.labels)
>>  nodes <- 1:length(xx)
>>  is.tip <- nodes<=ntips
>>  if (which.min(xx)!=ntips+1) {
>>    ## reorder nodes the way ape/phylo expects
>>    yy[internal] <- rev(yy[!is.tip])[order(xx[!is.tip])]
>>    xx[internal] <- rev(yy[!is.tipl])[order(xx[!is.tip])]
>>  }
>>  edges <- matrix(nrow=0,ncol=2)
>>  edge.length <- numeric(0)
>>  nnode <- length(xx)-ntips
>>  while (length(xx)>1) {
>>    ## find next node to include
>>    nextnode <- which(!is.tip & xx==max(xx[!is.tip]))[1]
>>    ## find daughters
>>    dist <- abs(yy-yy[nextnode])
>>    daughters <- which(is.tip & dist==min(dist[is.tip]))
>>    ## be careful with numeric fuzz?
>>    edges <- rbind(edges,
>>                   nodes[c(nextnode,daughters[1])],
>>                   nodes[c(nextnode,daughters[2])])
>>    edge.length <- c(edge.length,xx[daughters]-xx[nextnode])
>>    xx <- xx[-daughters]
>>    yy <- yy[-daughters]
>>    is.tip[nextnode] <- TRUE
>>    is.tip <- is.tip[-daughters]
>>    nodes <- nodes[-daughters]
>>  }
>>  zz <- list(tip.labels=tip.labels,
>>             edge=edges,
>>             edge.length=edge.length,
>>             Nnode=nnode)
>>  class(zz) <- "phylo"
>>  zz <- reorder(zz)
>>  zz
>> }
>>
>> newtree <- build.tree(xx,yy,tree.owls$tip.label)
>>
>> data(bird.orders)
>> plot(bird.orders,show.node.label=TRUE)
>> xy <- get("last_plot.phylo",envir=.PlotPhyloEnv)
>> points(xx,yy,col="white",pch=16,cex=2)
>> text(xx,yy,col=2,1:length(xx))
>>
>> xx <- xy$xx
>> yy <- xy$yy
>> newtree2 <- build.tree(xx,yy,bird.orders$tip.label)
>>
>> ===========
>> g3data notes:
>> ============
>>
>> INSTALLATION: install g3data and (for Windows) clip2png.jar
>>
>> Ubuntu and other Debians:
>>
>>  sudo apt-get g3data
>>
>> Windows:
>>   http://www.frantz.fi/software/Windows/g3data-1.5.1-win32.zip (for
>> windows)
>>
>> Mac (OS X 10.4 or 10.5): available via fink
>>  http://www.finkproject.org/doc/users-guide/index.php
>>  fink install g3data (?) or
>>  fink -b install g3data
>>
>>
>> get clip2png.jar :
>>
>> google "clip2png.jar", or go to ...
>> http://sourceforge.net/project/showfiles.php?group_id=185579
>> click on "download"
>> scroll down and click on "clip2png.jar"
>> save it somewhere (desktop?)
>>
>> USAGE
>>
>>  open the paper in your favorite PDF viewer
>>  select the desired figure, including axes but as little else as
>> possible,
>>    and copy to the clipboard, then save the clipboard as a PNG or GIF
>>
>>  OR adjust the PDF window so the figure fills it and take a snapshot
>> of the Window (on Ubuntu: alt-printscreen), save as PNG or GIF
>>
>>  open g3data
>>
>>  click on two points on the X and Y axis, fill in values
>>
>>  click on points
>>
>>  if you need to compress the display so that you can see the output
>> actions,
>> use the View menu or function keys to toggle display of zoom area
>> (F5),
>> axis settings (F6), or output properties (F7)
>>
>>  for multiple series, either click on points in order (e.g. work  
>> left-
>> to-right
>> for each series), then edit your output to put tags on increasing
>> series,
>> or output each series to a separate data file
>>
>>  note that by default g3data will save your data to a file named
>> after
>> your graphics file, e.g. "mydata.png.dat" -- which means that it will
>> show up in Windows as a file called "mydata.png",  with a DAT file
>> type -- which may be confusing.
>>
>>  reading into excel: use "Data" menu to separate into columns
>>
>>  Wish list for g3data:
>>
>> csv format output?
>> series tagging?
>> keyboard shortcuts for Save (Ctrl-S), Save As (Ctrl-A)?
>> built-in documentation?
>>
>>
>> plot(newtree2)
>>
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>
>
> ________________________________
> Brian O'Meara
> NESCent
> Durham, NC
> http://www.brianomeara.info
> ________________________________
>
>
>
>

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



More information about the R-sig-phylo mailing list