[Bioc-sig-seq] Operations between sets of ChIP-seq peak locations

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Jan 28 19:22:35 CET 2010


Hi Ivan,

On Thu, Jan 28, 2010 at 10:46 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
> Hi everybody,
>
> I have two sets of ChIP-seq peaks: A and B.
> How do I select the peaks in A that are not overlapped by any peak in B?
>
>
> Now in more detail:
>
> I do
>
> library(rtracklayer)
> A <- import('condition_A.bed', 'bed')
> B <- import('condition_B.bed', 'bed')
>
> they are both RangedData instances.
>
> I need to compute A - B.
>
> How do you guys do it?

You could try to pull out the range data from `import`-ed bed files
and look at the documentation for ?setdiff and ?findOverlaps in the
IRanges package to get at the answer.

This is semi-untested, but it might look something like this:

A.range <- range(A)
B.range <- range(B)
o <- findOverlaps(A.range, B.range)

Assuming your "space" (chromosome) names are like 'chr1', 'chr2', etc,
I think something like:

A.range$chr1[-unique(queryHits(o$chr1))] will tell you the intervals
in A that don't overlap with intervals in B.

Maybe not the best way, but that could do it.

Using IRanges::setdiff might work too, but will return you unique
ranges in one object that aren't in the other. The subtlety here is
that if there is partial overlap between two ranges, it will just
remove the partial overlap of a range, and not the entirety of the
range that shares the overlap (ugh .. does that make sense?)

Anyway, just look at the ?findOverlaps and ?setdiff documentation in
the IRanges package, and you can likely find an even more intuitive
way to do it. I expect the @fhcrc folk will probably also chime in
with a better way soon, too ..

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list