[Bioc-devel] Merging GInteraction/GenomicInteractions ranges
Luke Klein
|k|e|001 @end|ng |rom ucr@edu
Tue Feb 12 20:34:01 CET 2019
Hello. I am planning to develop a new package which extends the GenomicInteractions package. I would like some help/advice on implementing the following functionality.
Consider the follow GenomicInteractions object
GenomicInteractions object with 10 interactions and 1 metadata column:
seqnames1 ranges1 seqnames2 ranges2 | counts
<Rle> <IRanges> <Rle> <IRanges> | <integer>
[1] chrA 1-2 --- chrA 9-10 | 1
[2] chrA 1-2 --- chrA 15-16 | 1
[3] chrA 3-4 --- chrA 3-4 | 1
[4] chrA 5-6 --- chrA 7-8 | 1
[5] chrA 5-6 --- chrA 9-10 | 1
[6] chrA 7-8 --- chrA 7-8 | 1
[7] chrA 7-8 --- chrA 11-12 | 1
[8] chrA 7-8 --- chrA 17-18 | 1
[9] chrA 9-10 --- chrA 9-10 | 1
[10] chrA 9-10 --- chrA 15-16 | 1
-------
regions: 8 ranges and 0 metadata columns
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Which is visually represented thusly
I would like to do the following:
1) I want to group the regions into bins of WxW (in this case, W will be 3), as in a quad-tree structure <https://en.wikipedia.org/wiki/Quadtree> with the final group being WxW (instead of 2x2). This will involve
- iteratively dividing the matrix into quadrants {upper-left (0), upper-right (1), lower-left (2), lower-right(3)} .
- labeling each subdivision in a new column until the final WxW resolution is reached.
- sorting by the columns
GenomicInteractions object with 10 interactions and 1 metadata column:
seqnames1 ranges1 seqnames2 ranges2 | counts quad1 quad2
<Rle> <IRanges> <Rle> <IRanges> | <integer> <integer> <integer>
[1] chrA 1-2 --- chrA 9-10 | 1 0 1
[2] chrA 1-2 --- chrA 15-16 | 1 1 0
[3] chrA 3-4 --- chrA 3-4 | 1 0 0
[4] chrA 5-6 --- chrA 7-8 | 1 0 1
[5] chrA 5-6 --- chrA 9-10 | 1 0 1
[6] chrA 7-8 --- chrA 7-8 | 1 0 3
[7] chrA 7-8 --- chrA 11-12 | 1 0 3
[8] chrA 7-8 --- chrA 17-18 | 1 1 2
[9] chrA 9-10 --- chrA 9-10 | 1 0 3
[10] chrA 9-10 --- chrA 15-16 | 1 1 2
-------
regions: 8 ranges and 0 metadata columns
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Sorting by the two columns yields what I am after. Of course, I include the “quadX” column for illustration only. Upon implementation, I would like these columns hidden from the user.
GenomicInteractions object with 10 interactions and 1 metadata column:
seqnames1 ranges1 seqnames2 ranges2 | counts quad1 quad2
<Rle> <IRanges> <Rle> <IRanges> | <integer> <integer> <integer>
[1] chrA 3-4 --- chrA 3-4 | 1 0 0
[2] chrA 1-2 --- chrA 9-10 | 1 0 1
[3] chrA 5-6 --- chrA 7-8 | 1 0 1
[4] chrA 5-6 --- chrA 9-10 | 1 0 1
[5] chrA 7-8 --- chrA 7-8 | 1 0 3
[6] chrA 7-8 --- chrA 11-12 | 1 0 3
[7] chrA 9-10 --- chrA 9-10 | 1 0 3
[8] chrA 1-2 --- chrA 15-16 | 1 1 0
[9] chrA 7-8 --- chrA 17-18 | 1 1 2
[10] chrA 9-10 --- chrA 15-16 | 1 1 2
-------
regions: 8 ranges and 0 metadata columns
seqinfo: 1 sequence from an unspecified genome; no seqlengths
The sorting gives me the quad-tree structure, and each unique quadrant sequence defines the group.
GenomicInteractions object with 10 interactions and 1 metadata column:
seqnames1 ranges1 seqnames2 ranges2 | counts
<Rle> <IRanges> <Rle> <IRanges> | <integer>
[1] chrA 3-4 --- chrA 3-4 | 1
[2] chrA 1-2 --- chrA 9-10 | 1
[3] chrA 5-6 --- chrA 7-8 | 1
[4] chrA 5-6 --- chrA 9-10 | 1
[5] chrA 7-8 --- chrA 7-8 | 1
[6] chrA 7-8 --- chrA 11-12 | 1
[7] chrA 9-10 --- chrA 9-10 | 1
[8] chrA 1-2 --- chrA 15-16 | 1
[9] chrA 7-8 --- chrA 17-18 | 1
[10] chrA 9-10 --- chrA 15-16 | 1
-------
regions: 8 ranges and 0 metadata columns
seqinfo: 1 sequence from an unspecified genome; no seqlengths
2) Then I would like to merge the WxW window (i.e. bin the regions), expanding the ranges accordingly and adding the counts.. This process will
- ***identify all range-pairs in the same window and merge them into a new range pair with appropriately expanded ranges*** (this is my primary goal)
- sum the counts for each of the aforementioned range-pairs (i have already figured a way to do this)
GenomicInteractions object with 5 interactions and 1 metadata column:
seqnames1 ranges1 seqnames2 ranges2 | counts
<Rle> <IRanges> <Rle> <IRanges> | <integer>
[1] chrA 1-6 --- chrA 1-6 | 1
[2] chrA 1-6 --- chrA 7-12 | 3
[3] chrA 7-12 --- chrA 7-12 | 3
[4] chrA 1-6 --- chrA 13-18 | 1
[5] chrA 7-12 --- chrA 13-18 | 2
-------
regions: 3 ranges and 0 metadata columns
seqinfo: 1 sequence from an unspecified genome; no seqlengths
NOTE that ranges1 and ranges2 MUST expand so that the region width is 6, though the counts will only change if there exists another subrange covered by this bin/expansion that contains a positive count.
As always, speed in a concern.
Best,
— Luke Klein
PhD Student
Department of Statistics
University of California, Riverside
lklei001 using ucr.edu
More information about the Bioc-devel
mailing list