[Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

Valerie Obenchain vobencha at fhcrc.org
Wed Apr 30 23:18:28 CEST 2014


Hi Ryan,

These sound like great contributions. I didn't get an attachment - did 
you send one?

Thanks.
Valerie

On 04/30/2014 01:06 PM, Ryan C. Thompson wrote:
> Hi all,
>
> I recently asked about ways to do non-standard read counting in
> summarizeOverlaps, and Martin Morgan directed me toward writing a custom
> function to pass as the "mode" parameter. I have now written the custom
> modes that I require for counting my ChIP-Seq reads, and I figured I
> would contribute them back in case there was interest in merging them.
>
> The main three functions are "ResizeReads", "FivePrimeEnd", and
> "ThreePrimeEnd". The first allows you to directionally extend or shorten
> each read to the effective fragment length for the purpose of
> determining overlaps. For example, if each read represents the 5-prime
> end of a 150-bp fragment and you want to count these fragments using the
> Union mode, you could do:
>
>      summarizeOverlaps(mode=ResizeReads(mode=Union, width=150,
> fix="start"), ...)
>
> Note that ResizeReads takes a mode argument. It returns a function (with
> a closure storing the passed arguments) that performs the resizing (by
> coercing reads to GRanges and calling "resize") and then dispatches to
> the provided mode. (It probably needs to add a call to "match.fun"
> somewhere.)
>
> The other two functions are designed to count overlaps of only the read
> ends. They are implemented internally using "ResizeReads" with width=1.
>
> The other three counting modes (the "*ExtraArgs" functions) are meant to
> be used to easily construct new counting modes. Each function takes any
> number of arguments and returns a counting mode that works like the
> standard one of the same name, except that those arguments are passed as
> extra args to "findOverlaps". For example, you could do Union mode with
> a requirement for a minimum overlap of 10:
>
>      summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)
>
> Note that these can be combined or "nested". For instance, you might
> want a fragment length of 150 and a min overlap of 10:
>
>      myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10),
> width=150, fix="start")
>      summarizeOverlaps(mode=myCountingMode, ...)
>
> Anyway, if you think any of these are worthy of inclusion for
> BioConductor, feel free to add them in. I'm not so sure about the
> "nesting" idea, though. Functions that return functions (with states
> saved in closures, which are then passed into another function) are
> confusing for people who are not programmers by trade. Maybe
> summarizeOverlaps should just gain an argument to pass args to
> findOverlaps.
>
> -Ryan Thompson
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list