[R] fusion of overlapping intervals
Martin Morgan
mtmorgan at fhcrc.org
Mon Nov 5 18:23:12 CET 2012
On 11/05/2012 09:14 AM, Hermann Norpois wrote:
> Hello,
>
> I have start and end coordinates from different experiments (DNase
> hypersensitivity data) and now I would like to combine overlapping
> intervals. For instance (see my test data below) (2) 30-52 and (3) 49-101
> are combined to 30-101. But 49-101 and 70-103 would not be combined because
> they are on different chromosomes (chr a and chr b).
> Does anybody have an idea?
This data is very naturally handled by the "GRange" class in Bioconductor's
GenomicRanges package
source("http://bioconductor.org/biocLite.R")
biocLite("GenomicRanges')
library(GenomicRanges)
gr = GRanges(rep(c("a", "b"), each=3),
IRanges(c(5, 30, 49, 70, 100, 129),
c(10, 52, 101, 103, 130, 140)),
strand="*")
and then
> reduce(gr)
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] a [ 5, 10] *
[2] a [30, 101] *
[3] b [70, 140] *
---
seqlengths:
a b
NA NA
There are vignettes
vignette(package="GenomicRanges")
and additional training material, e.g.,
http://bioconductor.org/help/course-materials/2012/CSC2012/
If you pursue this solution then please follow-up with questions on the
Bioconductor mailing list
http://bioconductor.org/help/mailing-list/
Martin
> Thanks
> Hermann
>
>> df
> chr start end
> 1 a 5 10
> 2 a 30 52
> 3 a 49 101
> 4 b 70 103
> 5 b 100 130
> 6 b 129 140
>> dput (df)
> structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a",
> "b"), class = "factor"), start = c(5, 30, 49, 70, 100, 129),
> end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr", "start",
> "end"), row.names = c(NA, -6L), class = "data.frame")
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-help
mailing list