[Bioc-devel] How to speed up GRange comparision

Michael Lawrence |@wrence@m|ch@e| @end|ng |rom gene@com
Thu Jan 30 19:56:20 CET 2020


What happened to just poverlaps()?

On Thu, Jan 30, 2020 at 10:34 AM Pages, Herve <hpages using fredhutch.org> wrote:
>
> On 1/29/20 23:31, web working wrote:
> > Hi Herve,
> >
> > Thank you for your answer. pcompare works fine for me. Here my solution:
> >
> > query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
> > subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
> > out <- vector("numeric", length(query))
> > out[(which(abs(pcompare(query, subject))<5))] <- 1
> > out
>
> Why not just
>
>    out <- abs(pcompare(query, subject)) < 5
>
> In any case you should use integer instead of numeric (twice more
> compact in memory).
>
> H.
>
> >
> > Carey was right that this here is off list. Next time I will pose my
> > question on support.bioconductor.org
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__support.bioconductor.org&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VZ2-jg7W_Ctrav2BpPVUPpvJlyISX3QVwFAzTnDnNTs&s=Jdmp3dD6ubzPdE8KjFJ3urOav62YTOmxYcYZ4000MY8&e=>.
> >
> > Best,
> >
> > Tobias
> >
> > Am 29.01.20 um 18:02 schrieb Pages, Herve:
> >> Yes poverlaps().
> >>
> >> Or pcompare(), which should be even faster. But only if you are not
> >> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
> >> of the codes returned by pcompare().
> >>
> >> H.
> >>
> >> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> >>> poverlaps()?
> >>>
> >>> On Wed, Jan 29, 2020 at 7:50 AM web working<webworking using posteo.de>  wrote:
> >>>> Hello,
> >>>>
> >>>> I have two big GRanges objects and want to search for an overlap of  the
> >>>> first range of query with the first range of subject. Then take the
> >>>> second range of query and compare it with the second range of subject
> >>>> and so on. Here an example of my problem:
> >>>>
> >>>> # GRanges objects
> >>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> >>>> 22)), id=1:4)
> >>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> >>>> 21)), id=1:4)
> >>>>
> >>>> # The 2 overlaps at the first position should not be counted, because
> >>>> these ranges are at different rows.
> >>>> countOverlaps(query, subject)
> >>>>
> >>>> # Approach 1 (bad style. I have simplified it to understand)
> >>>> dat <- as.data.frame(findOverlaps(query, subject))
> >>>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> >>>> indexBool <- dat[indexDat,1]
> >>>> out <- rep(FALSE, length(query))
> >>>> out[indexBool] <- TRUE
> >>>> as.numeric(out)
> >>>>
> >>>> # Approach 2 (bad style and takes too long)
> >>>> out <- vector("numeric", 4)
> >>>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> >>>> out
> >>>>
> >>>> # Approach 3 (wrong results)
> >>>> as.numeric(overlapsAny(query, subject))
> >>>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
> >>>>
> >>>>
> >>>> Maybe someone has an idea to speed this up?
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>> Tobias
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel using r-project.org  mailing list
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
> >>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla using gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



More information about the Bioc-devel mailing list