[Bioc-devel] How to speed up GRange comparision

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Thu Jan 30 20:10:17 CET 2020


Yes poverlaps() is a good option, as mentioned earlier. I was just 
commenting about doing something like:

   out <- vector("numeric", length(x))
   out[(which(foo(x))] <- 1

when you can just do:

   out <- foo(x)

H.

On 1/30/20 10:56, Michael Lawrence wrote:
> What happened to just poverlaps()?
> 
> On Thu, Jan 30, 2020 at 10:34 AM Pages, Herve <hpages using fredhutch.org> wrote:
>>
>> On 1/29/20 23:31, web working wrote:
>>> Hi Herve,
>>>
>>> Thank you for your answer. pcompare works fine for me. Here my solution:
>>>
>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
>>> out <- vector("numeric", length(query))
>>> out[(which(abs(pcompare(query, subject))<5))] <- 1
>>> out
>>
>> Why not just
>>
>>     out <- abs(pcompare(query, subject)) < 5
>>
>> In any case you should use integer instead of numeric (twice more
>> compact in memory).
>>
>> H.
>>
>>>
>>> Carey was right that this here is off list. Next time I will pose my
>>> question on support.bioconductor.org
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__support.bioconductor.org&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VZ2-jg7W_Ctrav2BpPVUPpvJlyISX3QVwFAzTnDnNTs&s=Jdmp3dD6ubzPdE8KjFJ3urOav62YTOmxYcYZ4000MY8&e=>.
>>>
>>> Best,
>>>
>>> Tobias
>>>
>>> Am 29.01.20 um 18:02 schrieb Pages, Herve:
>>>> Yes poverlaps().
>>>>
>>>> Or pcompare(), which should be even faster. But only if you are not
>>>> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
>>>> of the codes returned by pcompare().
>>>>
>>>> H.
>>>>
>>>> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
>>>>> poverlaps()?
>>>>>
>>>>> On Wed, Jan 29, 2020 at 7:50 AM web working<webworking using posteo.de>  wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have two big GRanges objects and want to search for an overlap of  the
>>>>>> first range of query with the first range of subject. Then take the
>>>>>> second range of query and compare it with the second range of subject
>>>>>> and so on. Here an example of my problem:
>>>>>>
>>>>>> # GRanges objects
>>>>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>>>>>> 22)), id=1:4)
>>>>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>>>>>> 21)), id=1:4)
>>>>>>
>>>>>> # The 2 overlaps at the first position should not be counted, because
>>>>>> these ranges are at different rows.
>>>>>> countOverlaps(query, subject)
>>>>>>
>>>>>> # Approach 1 (bad style. I have simplified it to understand)
>>>>>> dat <- as.data.frame(findOverlaps(query, subject))
>>>>>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>>>>>> indexBool <- dat[indexDat,1]
>>>>>> out <- rep(FALSE, length(query))
>>>>>> out[indexBool] <- TRUE
>>>>>> as.numeric(out)
>>>>>>
>>>>>> # Approach 2 (bad style and takes too long)
>>>>>> out <- vector("numeric", 4)
>>>>>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>>>>>> out
>>>>>>
>>>>>> # Approach 3 (wrong results)
>>>>>> as.numeric(overlapsAny(query, subject))
>>>>>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>>>>>
>>>>>>
>>>>>> Maybe someone has an idea to speed this up?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Tobias
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel using r-project.org  mailing list
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
>>>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
> 
> 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioc-devel mailing list