[Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

Ryan C. Thompson rct at thompsonclan.org
Wed Apr 30 23:41:01 CEST 2014


No, I forgot to attach the file. Here is the link:

https://www.dropbox.com/s/7qghtksl3mbvlsl/counting-modes.R

On Wed 30 Apr 2014 02:18:28 PM PDT, Valerie Obenchain wrote:
> Hi Ryan,
>
> These sound like great contributions. I didn't get an attachment - did
> you send one?
>
> Thanks.
> Valerie
>
> On 04/30/2014 01:06 PM, Ryan C. Thompson wrote:
>> Hi all,
>>
>> I recently asked about ways to do non-standard read counting in
>> summarizeOverlaps, and Martin Morgan directed me toward writing a custom
>> function to pass as the "mode" parameter. I have now written the custom
>> modes that I require for counting my ChIP-Seq reads, and I figured I
>> would contribute them back in case there was interest in merging them.
>>
>> The main three functions are "ResizeReads", "FivePrimeEnd", and
>> "ThreePrimeEnd". The first allows you to directionally extend or shorten
>> each read to the effective fragment length for the purpose of
>> determining overlaps. For example, if each read represents the 5-prime
>> end of a 150-bp fragment and you want to count these fragments using the
>> Union mode, you could do:
>>
>>      summarizeOverlaps(mode=ResizeReads(mode=Union, width=150,
>> fix="start"), ...)
>>
>> Note that ResizeReads takes a mode argument. It returns a function (with
>> a closure storing the passed arguments) that performs the resizing (by
>> coercing reads to GRanges and calling "resize") and then dispatches to
>> the provided mode. (It probably needs to add a call to "match.fun"
>> somewhere.)
>>
>> The other two functions are designed to count overlaps of only the read
>> ends. They are implemented internally using "ResizeReads" with width=1.
>>
>> The other three counting modes (the "*ExtraArgs" functions) are meant to
>> be used to easily construct new counting modes. Each function takes any
>> number of arguments and returns a counting mode that works like the
>> standard one of the same name, except that those arguments are passed as
>> extra args to "findOverlaps". For example, you could do Union mode with
>> a requirement for a minimum overlap of 10:
>>
>>      summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)
>>
>> Note that these can be combined or "nested". For instance, you might
>> want a fragment length of 150 and a min overlap of 10:
>>
>>      myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10),
>> width=150, fix="start")
>>      summarizeOverlaps(mode=myCountingMode, ...)
>>
>> Anyway, if you think any of these are worthy of inclusion for
>> BioConductor, feel free to add them in. I'm not so sure about the
>> "nesting" idea, though. Functions that return functions (with states
>> saved in closures, which are then passed into another function) are
>> confusing for people who are not programmers by trade. Maybe
>> summarizeOverlaps should just gain an argument to pass args to
>> findOverlaps.
>>
>> -Ryan Thompson
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list