[BioC] IRanges:::coverage() speedup/enchancement

Tue Dec 1 11:21:07 CET 2009

Chuck,
Thanks for the speedup to coverage for large width IRanges objects. I 
checked in changes to IRanges 1.5.13 in BioC 2.6 (for use with R-devel) 
based on the code you submitted. I'll back port this code to BioC 2.5 (R 
2.10) in the next few days as well. Just out of curiosity, what is the 
source of these long width intervals, where do the weights for these 
intervals come from, and what operations do you perform on the resulting 
coverage vectors?

Echoing Michael's comments, we haven't supported double precision 
weights in coverage calculations in the past because we hadn't 
encountered any common use cases for them and there is the workaround 
Michael mentioned if the need arose. Providing some context for the 
enhancement request would help motivate us to make a change. :)

Cheers,
Patrick

Michael Lawrence wrote:
> On Mon, Nov 30, 2009 at 11:10 AM, Charles C. Berry <cberry at tajo.ucsd.edu>wrote:
>
>   
>> The semantics of the IRanges package and especially the RangedData class
>> are very apprpriate for some of the applications I deal with.
>>
>> Unfortunately, coverage() is too slow to be useful to me.
>>
>> I wonder if the Biocore Team would consider retooling it to make it
>> faster? Below I provide a link to a revised coverage.c that might suffice.
>>
>> The kind of case I need to handle has width values in 10kbase to 10Mbase
>> range. As a toy example, being able to run stuff like
>>
>>      tmp <- coverage( IRanges( start=seq(1,by=1000,length=10000),
>>                        width=1e7 ) )
>>
>> quickly is needed.
>>
>> A revised version of coverage.c is available at
>>
>> http://cabig2.ucsd.edu:8080/Plone/Members/ccberry/software/coverage.c/view
>>
>> It will handle the case above almost instantly (while the existing version
>> needs about 8 minutes on my machine) and seems about equal to the
>> existing version for cases with width=30.  In the cases I've looked at
>> gc() reports the same memory usage.
>>
>> ---
>>
>> Also, I wonder if the Biocore Team would entertain allowing the 'weight'
>> argument of coverage to be of type double? This would help in cases in
>> which downweighting of counts of some genomic features is desired.
>>
>>
>>     
> In many use cases, it's probably sufficient to simply round floating point
> numbers to integers after multiplying by a power of 10. That only goes so
> far though, so supporting double-precision seems reasonable. The type of the
> output will simply depend on the type of the weights.
>
>
>
>   
>> Thanks,
>>
>> Chuck
>>
>> --
>> Charles C. Berry                            (858) 534-2098
>>                                            Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu               UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>