[BioC] Some particulars about rtracklayer::liftOver
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Apr 24 08:43:39 CEST 2012
Hi all,
I'm really not too well versed on the ins and outs of using `liftOver`
from via the kent tools, so I'm comparing results from using
rtracklayer::liftOver vs. a default run of liftOver from the command
line.
As a disclaimer, I'm lifting over coordinates between species. I know
this isn't the best idea -- the party line from the UCSC folks seem to
suggest that this is a bad idea because it is a "coarse" mapping, and
you might really want to use pslMap in this case (which I will play
with, too). For those who are interested, here's a few links for you:
https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015074.html
https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html
(summarizes next link)
https://lists.soe.ucsc.edu/pipermail/genome/2007-November/015032.html
http://article.gmane.org/gmane.science.biology.ucscgenome.general/11300
The correct command line switches to invoke pslMap using a liftover chain:
https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015908.html
>From the first link, pslMap is apparently preferred because:
"""
pslMap shows more detail -- it translates a region using each block
of the chain, introducing gaps where the chain has gaps.
"""
>From some results of rtracklayer::liftOver, it seems that this call
implements this type of functionality from pslMap -- is that true?
I say that because some of the ranges I liftOver are split into > 1
range into the target genome. Even though they are two ranges,
sometimes they are contiguous, or only separated by a small gap.
>From the few cases I've checked -- some examples below -- these
multiple ranges returned from rtracklayer::liftOver correspond to the
results from doing a pslMap
For instance:
The following mm9 range:
chr1 [4836085, 4836142] +
Gets lifted to hg19 here (notice small gap of width 5) with rtracklayer:
chr8 [54959674, 54959715] -
chr8 [54959654, 54959669] -
but from the command line, it's lifted to:
chr8 54959654 54959716 ... -
At first I thought the rtracklayer fenceposts and the
ucsc/cmd-line::liftOver results fence-posted the same interval, but
the command line result is supposed to be ucsc-half-open (so start
from 0), right? The GRanges were exported to a bed via rtracklayer, so
I can confirm that coordinates were xlated to half-open correctly. But
the result of the liftOver from the command line is off-by-one at its
start position when compared to rtracklayer, no?
Further -- these two target ranges are the same that come out of a
pslMap, so -- that's nice.
I started writing this post to inquire rtracklayer's insertion of gaps
and to confirm that my by eye matching of this w/ pslMap output is
intentional (I guess that's weird of me to ask, but just wanted to be
sure). Then this off-by-one thing popped up as I was writing. There
are other regions that are lifted over whose coords from the command
line seem to have "the correct" -1 shift at the start ("correct" in as
far as I'm mentally doing the half-open coord shift), and equal at the
end, eg. the mm9 range:
chr1 [9617177, 9617221] +
usring rtracklayer::liftOver goes to hg19:
chr8 [67430719, 67430763] +
and from command line:
chr8 67430718 67430763 ... +
which seems to me like the same interval as rtracklayer.
So, in a nut:
(1) rtracklayer's gapped-liftOver magic seems to return the same info
I would get from a call to pslMap. So, (a) thanks for that, and (b)
I'm correct, right? :-)
(2) This off by one thing -- am I just seeing things, or?
Thanks for making it this far,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list