[Bioc-devel] How to speed up GRange comparision

web working webwork|ng @end|ng |rom po@teo@de
Wed Jan 29 16:49:40 CET 2020


Hello,

I have two big GRanges objects and want to search for an overlap of  the 
first range of query with the first range of subject. Then take the 
second range of query and compare it with the second range of subject 
and so on. Here an example of my problem:

# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 
21)), id=1:4)

# The 2 overlaps at the first position should not be counted, because 
these ranges are at different rows.
countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias



More information about the Bioc-devel mailing list