[BioC] Maximal length of Rle vectors

Hervé Pagès hpages at fhcrc.org
Wed Nov 30 23:53:07 CET 2011


Hi Hans-Ulrich,

Thanks for the bug report. A fix is on its way. It will raise an
error when one is trying to create an Rle with length >
.Machine$integer.max.
Allowing an Rle to have a length >  .Machine$integer.max, even with
a warning, would cause all sort of problems, the first of them being
that its length would be NA:

   > Rle(1:2, c(1500000000, 1500000000))
   'integer' Rle of length NA with 2 runs
     Lengths: 1500000000 1500000000
     Values :          1          2
   Warning message:
   In sum(runLength(x)) : Integer overflow - use sum(as.numeric(.))

Note that the coverage accross the human genome is best represented
by a named RleList (with one element per chromosome), which doesn't
have the .Machine$integer.max limitation. See the "GenomicRanges Use
Cases" vignette in the GenomicRanges packages for an illustration of
this.

Cheers,
H.


On 11-11-30 09:28 AM, Hans-Ulrich Klein wrote:
> Dear all,
>
> I observed this problem regarding the maximal length of a Rle vector:
>
>   >  rle = Rle(rep(0, 1000000000))
>   >  length(rle)
> [1] 1000000000
>   >  length(c(rle, rle, rle))
> [1] -1294967296
>
>
> Probably, it is caused by the maximum positive number (~2.1E9) that can
> be represented by an integer variable. However, there is no warning
> message.
> I noticed this problem when I wanted to calculate the average coverage
> of a sequencing project accross the human genome. I used the coverage()
> method and then concatenated all chromosomes. This should give me an Rle
> vector of length ~3*109, but mean() does not work on that vector.
>
> Best,
> Hans-Ulrich
>
>
>   >  sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>    [7] LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] IRanges_1.12.3
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list