[Bioc-sig-seq] uniqueFilter in the ShortRead package - Integer overflow - bug fix

Nora Rieber n.rieber at dkfz-heidelberg.de
Mon Feb 8 13:54:08 CET 2010


Dear Martin,

I was thrilled to discover the occurrenceFilter() function and I tried
it out right away on my data.
However I got this error & warning:

> aln_B[occurrenceFilter(withSread=FALSE, duplicates="head")(aln_B)]
Error in if (sum(q) != 0L) { : Missing value where TRUE/FALSE needed
Warning:
In sum(q) : integer overflow - use sum(as.numeric(,))


I've had a look and then modified the function code and it seems that using

if (sum(as.numeric(q)) != 0L)

instead of

if (sum(q) != 0L) {

does fix the problem. However I'm not sure whether this might modify the
functionality in any unwanted way!

Best wishes,
Nora

> Date: Sun, 07 Feb 2010 05:56:18 -0800
> From: Martin Morgan <mtmorgan at fhcrc.org>
> To: Jason Lu <jasonlu68 at gmail.com>
> Cc: bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] uniqueFilter in the ShortRead package
> Message-ID: <4B6EC682.4000401 at fhcrc.org>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Jason --
>
> On 02/05/2010 11:30 AM, Jason Lu wrote:
>   
>> Hi,
>>
>> I have been using the ShortRead package with my sequencing data. It has been
>> making my life a lot easier.
>>
>> One thing I noticed that the logic in the uniqueFilter function seems to be
>> problematic.
>>
>> The original function is:
>> function (withSread = TRUE, .name = "UniqueFilter")
>> {
>>     .check_type_and_length(withSread, "logical", 1)
>>     srFilter(function(x) {
>>         if (withSread)
>>             !srduplicated(x)
>>         else {
>>             !(duplicated(position(x)) & duplicated(strand(x)) &
>>                 duplicated(chromosome(x)))
>>         }
>>     }, name = .name)
>> }
>>
>> If withSread = FALSE, the else part seems to filter out lots of reads I
>> would like to keep.
>>
>> My dumb solution is to have this change:
>> !(duplicated(paste(position(x), strand(x), chromosome(x),sep=";")))
>>
>> I may have misused the function though.
>>     
>
> Technically the function works as documented but you're right that this
> is not usually what one wants. I've implemented occurrenceFilter() in
> the development version of ShortRead (look for version >= 1.5.14) which
> does what you want using withSread=FALSE; it is also meant to do more
> flexible filtering, e.g., reads represented >=min and <= max times, and
> to treat sets of duplicate reads differently (e.g., ignoring all
> duplicates, rather than keeping the first).
>
> I deprecated uniqueFilter in the development branch, which means that it
> still works but will be removed in a future release.
>
> Thanks for the report.
>
> Martin
>
>   
>>> sessionInfo()
>>>       
>> R version 2.10.0 (2009-10-26)
>> x86_64-redhat-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] ShortRead_1.4.0   lattice_0.17-26   BSgenome_1.14.2   Biostrings_2.14.8
>> [5] IRanges_1.4.9
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.6.1 grid_2.10.0   hwriter_1.1   tools_2.10.0
>>     
>> Thanks,
>> Jason
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>     
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
>
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: n_rieber.vcf
Type: text/x-vcard
Size: 251 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20100208/de20385d/attachment.vcf>


More information about the Bioc-sig-sequencing mailing list