This has got to be one of the most complex problems posed here.
Here is one approach, but it may be completely wrong:
hits_x <- findOverlaps(x)
hits_y <- findOverlaps(y)
hits_both <- intersect(hits_x, hits_y)
I think the constraint of mutual overlap in x and y is now satisfied. In
order to reduce, we have to think of the Hits object as a graph, where the
connected components are those ranges that need to be reduced.
g <- graph::graphNEL(as.character(seq_len(length(x))),
split(subjectHits(hits_both), queryHits(hits_both)))
comp <- unname(RBGL::connectedComp(g)) # unname just for efficiency
That graphNEL constructor might kill your performance? There are other ways
to build the graph. Now we can form GRangesLists from the connected
components, and since we know they all overlap, just get the range, and
unlist them back to GRanges:
ids <- as.integer(unlist(comp))
reduced_x <- unlist(range(relist(x[ids], comp)))
reduced_y <- unlist(range(relist(y[ids], comp)))
And combine the result into a DataFrame with the range counts:
answer <- DataFrame(x=reduced_x, y=reduced_y, count=elementLengths(comp))
Hopefully that gets you closer. This hasn't been tested at all.
Michael
On Mon, Apr 7, 2014 at 11:40 AM, Niu, Liang (NIH/NIEHS) [E] <
liang.niu@nih.gov> wrote:
> Dear Herve,
>
> I have a question: suppose that I have two GRanges of the same length, say
> x and y, where x[i] and y[i] are two genomic ranges that are paired. What I
> want to do is to reduce x and y simultaneously, i.e., combine (x[i],y[i])
> and (x[j],y[j]) when x[i] overlaps with x[j] and y[i] overlaps with y[j].
> Ideally, I would like an output in the following format:
>
> range_x range_y count
> chr?:???-??? chr?:???-??? ?
>
> where the range_x and range_y are the reduced ranges, and count is the
> number of records associated with the chr?:???-??? and chr?:???-???. How
> can I do this?
>
> Thanks!
>
> Liang
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]