[BioC] [GenomicRanges] subsetByOverlaps to keep info from both GRanges objects?
Enrico Ferrero
enricoferrero86 at gmail.com
Tue Aug 20 12:51:50 CEST 2013
Hi,
I have two GRanges objects, the first one is a list of SNPs, the
second one are DNase hypersensitivity sites:
##########
library(GenomicRanges)
...
> snp
GRanges with 192 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
rs000001 chr1 [ 37967779, 37967780] + | 0
rs000002 chr1 [165967416, 165967417] - | 0
rs000003 chr1 [218860069, 218860070] - | 0
rs000004 chr1 [ 17306673, 17306674] - | 0
rs000005 chr1 [ 41293414, 41293415] + | 0
... ... ... ... ... ...
rs000188 chr8 [ 97522507, 97522508] - | 0
rs000189 chr8 [ 15532582, 15532583] + | 0
rs000190 chr8 [ 72270031, 72270032] + | 0
rs000191 chr9 [126511086, 126511087] + | 0
rs000192 chr9 [ 98231008, 98231009] + | 0
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21
chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9
NA NA NA NA NA NA NA NA NA ... NA
NA NA NA NA NA NA NA NA
> dnase
GRanges with 145038 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 [ 10120, 10270] * | 0
[2] chr1 [237700, 237850] * | 0
[3] chr1 [521440, 521590] * | 0
[4] chr1 [565560, 565710] * | 0
[5] chr1 [565860, 566010] * | 0
... ... ... ... ... ...
[145034] chrX [154543640, 154543790] * | 0
[145035] chrX [154560420, 154560570] * | 0
[145036] chrX [154563960, 154564110] * | 0
[145037] chrX [154842100, 154842250] * | 0
[145038] chrX [154862200, 154862350] * | 0
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX
chrY
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA
NA
##########
I can use subsetByOverlaps() in both directions to compute the overlap
between them and return a GRanges object:
##########
> subsetByOverlaps(dnase, snp)
GRanges with 5 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 [ 17306560, 17306710] * | 0
[2] chr2 [169869820, 169869970] * | 0
[3] chr4 [145506440, 145506590] * | 0
[4] chr5 [ 15014080, 15014230] * | 0
[5] chr5 [ 15117400, 15117550] * | 0
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX
chrY
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA
NA
> subsetByOverlaps(snp, dnase)
GRanges with 6 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
rs2235746 chr1 [ 17306671, 17306672] - | 0
rs4157777 chr2 [169869904, 169869904] - | 0
rs6858330 chr4 [145506558, 145506559] + | 0
rs13146741 chr4 [145506453, 145506454] + | 0
rs32847 chr5 [ 15117438, 15117439] + | 0
rs7341842 chr5 [ 15014184, 15014185] + | 0
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
##########
The first GRanges objects stores the DNase genomic locations
overlapping with the SNPs, while the second one contains the SNPs IDs
(as GRanges names) and genomic locations overlapping with the DNase
dataset.
Now, what I actually need is a GRanges object that stores the SNPs IDs
and the DNase genomic locations. Is this possible?
Thank you.
Best,
--
Enrico Ferrero
PhD Student
Department of Genetics
Cambridge Systems Biology Centre
University of Cambridge
More information about the Bioconductor
mailing list