[Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

Ryan C. Thompson rct at thompsonclan.org
Wed Apr 30 22:06:25 CEST 2014


Hi all,

I recently asked about ways to do non-standard read counting in 
summarizeOverlaps, and Martin Morgan directed me toward writing a custom 
function to pass as the "mode" parameter. I have now written the custom 
modes that I require for counting my ChIP-Seq reads, and I figured I 
would contribute them back in case there was interest in merging them.

The main three functions are "ResizeReads", "FivePrimeEnd", and 
"ThreePrimeEnd". The first allows you to directionally extend or shorten 
each read to the effective fragment length for the purpose of 
determining overlaps. For example, if each read represents the 5-prime 
end of a 150-bp fragment and you want to count these fragments using the 
Union mode, you could do:

     summarizeOverlaps(mode=ResizeReads(mode=Union, width=150, 
fix="start"), ...)

Note that ResizeReads takes a mode argument. It returns a function (with 
a closure storing the passed arguments) that performs the resizing (by 
coercing reads to GRanges and calling "resize") and then dispatches to 
the provided mode. (It probably needs to add a call to "match.fun" 
somewhere.)

The other two functions are designed to count overlaps of only the read 
ends. They are implemented internally using "ResizeReads" with width=1.

The other three counting modes (the "*ExtraArgs" functions) are meant to 
be used to easily construct new counting modes. Each function takes any 
number of arguments and returns a counting mode that works like the 
standard one of the same name, except that those arguments are passed as 
extra args to "findOverlaps". For example, you could do Union mode with 
a requirement for a minimum overlap of 10:

     summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)

Note that these can be combined or "nested". For instance, you might 
want a fragment length of 150 and a min overlap of 10:

     myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10), 
width=150, fix="start")
     summarizeOverlaps(mode=myCountingMode, ...)

Anyway, if you think any of these are worthy of inclusion for 
BioConductor, feel free to add them in. I'm not so sure about the 
"nesting" idea, though. Functions that return functions (with states 
saved in closures, which are then passed into another function) are 
confusing for people who are not programmers by trade. Maybe 
summarizeOverlaps should just gain an argument to pass args to findOverlaps.

-Ryan Thompson



More information about the Bioc-devel mailing list