[BioC] IRanges::Rle and missing values

Patrick Aboyoun paboyoun at fhcrc.org
Sat Aug 21 02:03:38 CEST 2010


  Kasper,
I have addressed these two issues, which were caused by inappropriate 
comparisons using NA_REAL at the C-level for 'numeric' Rle objects. As 
with the runmed function in the stats package, I don't currently support 
missing values in the run* methods for Rle objects. Below is the current 
behavior in IRanges 1.6.15 (BioC 2.6, R-2.11) and IRanges 1.7.21 (BioC 
2.7, R-devel). I can add support for missing values. Just so I 
prioritize this, when do you encounter missing values in your Rle vectors?

 > tmp = Rle(c(1,2,2,2,3,NA,NA,NA,NA,2,3,3,3,3,3,2))

 > tmp
'numeric' Rle of length 16 with 7 runs
   Lengths:  1  3  1  4  1  5  1
   Values :  1  2  3 NA  2  3  2

 > runsum(tmp, 3)
Error in runsum(tmp, 3) : some values are NA, NaN, +/-Inf

 > sessionInfo()
R version 2.12.0 Under development (unstable) (2010-08-01 r52659)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IRanges_1.7.21



Patrick


On 8/20/10 9:43 AM, Patrick Aboyoun wrote:
>  Kasper,
> I'll take a look into this. The Rle constructor issue seems to be 
> isolated to 'numeric' and 'complex' Rles. I'll have an update out soon.
>
>
> Patrick
>
>
> On 8/20/10 8:53 AM, Kasper Daniel Hansen wrote:
>> Would it make sense to allow missing values in Rle objects and also to
>> incorporate removal of missing values in running summaries (and
>> possibly other functions)?
>>
>> Example:
>>
>>> tmp = Rle(c(1,2,2,2,3,NA,NA,NA,NA,2,3,3,3,3,3,2))
>>> tmp
>> 'numeric' Rle of length 16 with 10 runs
>>    Lengths:  1  3  1  1  1  1  1  1  5  1
>>    Values :  1  2  3 NA NA NA NA  2  3  2
>>
>> Seems like the run of 4 NA's is treated differently
>>
>>> runsum(tmp, k = 2)
>> 'numeric' Rle of length 15 with 11 runs
>>    Lengths:  1  2  1  1  1  1  1  1  1  4  1
>>    Values :  3  4  5 NA NA NA NA NA NA NA NA
>>
>> And there is no way to do runsum(..., na.rm = TRUE) like in sum (as
>> far as I can see).
>>
>> Kasper
>>
>>> sessionInfo()
>> R version 2.12.0 Under development (unstable) (2010-08-20 r52790)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>>   [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
>>   [7] LC_PAPER=en_US.iso885915       LC_NAME=C
>>   [9] LC_ADDRESS=C                   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid      stats     graphics  grDevices datasets  utils     methods
>> [8] base
>>
>> other attached packages:
>> [1] multicore_0.1-3   IRanges_1.7.19    matrixStats_0.2.1 
>> R.methodsS3_1.2.0
>> [5] ggplot2_0.8.8     proto_0.3-8       reshape_0.8.3     plyr_1.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.12.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list