[BioC] rtracklayer::liftOver ordering
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Thu Aug 25 03:42:41 CEST 2011
How efficient would this be? I sometimes use liftOver on millions of regions.
Kasper
2011/8/24 Michael Lawrence <lawrence.michael at gene.com>:
> That's a good idea. I can make that change.
>
> Michael
>
> 2011/8/24 Hervé Pagès <hpages at fhcrc.org>
>
>> Hi there,
>>
>>
>> On 11-08-24 10:48 AM, Michael Lawrence wrote:
>>
>>> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe at jhsph.edu> wrote:
>>>
>>> I'm having a problem maintaining the ordering of my GRanges object
>>>> when I lift it over using rtracklayer::liftOver. For example:
>>>>
>>>> g # my regions
>>>>>
>>>> GRanges with 5 ranges and 0 elementMetadata values
>>>> seqnames ranges strand |
>>>> <Rle> <IRanges> <Rle> |
>>>> [1] chr19 [ 13130686, 13133039] * |
>>>> [2] chr4 [160026138, 160028079] * |
>>>> [3] chr12 [ 65671230, 65672140] * |
>>>> [4] chr8 [ 19615409, 19616461] * |
>>>> [5] chr14 [ 99706752, 99708661] * |
>>>>
>>>> chain = import.chain("hg19ToHg18.over.**chain") # from UCSC
>>>>> lifted = liftOver(g, chain) # suppressed unmatched chrs
>>>>> lifted
>>>>>
>>>> GRanges with 5 ranges and 0 elementMetadata values
>>>> seqnames ranges strand |
>>>> <Rle> <IRanges> <Rle> |
>>>> [1] chr4 [160245588, 160247529] * |
>>>> [2] chr8 [ 19659689, 19660741] * |
>>>> [3] chr12 [ 63957497, 63958407] * |
>>>> [4] chr14 [ 98776505, 98778414] * |
>>>> [5] chr19 [ 12991686, 12994039] * |
>>>>
>>>> This is just a toy example with 5 regions all on different
>>>> chromosomes, but with real data where there are multiple regions per
>>>> chromosome, I am unable to determine the resulting matched lifted data
>>>> for a particular region. Is there any way to preserve the ordering of
>>>> my original list in the liftOver output? Presorting by chromosome and
>>>> position might work 99% of time, but the ordering of some regions
>>>> might shift during the liftOver, and I would not be able to tell if
>>>> this occurred.
>>>>
>>>>
>>>> I think Kasper's suggestion of an ID column is a good one. The basic
>>> problem
>>> is that there is not necessarily a 1-1 correspondence after lift-over. A
>>> single region in say human could be broken up into multiple regions in
>>> mouse.
>>>
>>
>> An alternative would be that liftOver() returns a GRangesList instead
>> of GRanges. People who don't care about the exact mapping between
>> the input and the output could always do 'unlist(liftOver(g, chain))'
>> and get what they are getting right now.
>>
>> H.
>>
>>
>>> Michael
>>>
>>> Thanks a lot,
>>>
>>>> Andrew Jaffe
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list