[BioC] Maximal length of Rle vectors
Hervé Pagès
hpages at fhcrc.org
Wed Nov 30 23:53:07 CET 2011
Hi Hans-Ulrich,
Thanks for the bug report. A fix is on its way. It will raise an
error when one is trying to create an Rle with length >
.Machine$integer.max.
Allowing an Rle to have a length > .Machine$integer.max, even with
a warning, would cause all sort of problems, the first of them being
that its length would be NA:
> Rle(1:2, c(1500000000, 1500000000))
'integer' Rle of length NA with 2 runs
Lengths: 1500000000 1500000000
Values : 1 2
Warning message:
In sum(runLength(x)) : Integer overflow - use sum(as.numeric(.))
Note that the coverage accross the human genome is best represented
by a named RleList (with one element per chromosome), which doesn't
have the .Machine$integer.max limitation. See the "GenomicRanges Use
Cases" vignette in the GenomicRanges packages for an illustration of
this.
Cheers,
H.
On 11-11-30 09:28 AM, Hans-Ulrich Klein wrote:
> Dear all,
>
> I observed this problem regarding the maximal length of a Rle vector:
>
> > rle = Rle(rep(0, 1000000000))
> > length(rle)
> [1] 1000000000
> > length(c(rle, rle, rle))
> [1] -1294967296
>
>
> Probably, it is caused by the maximum positive number (~2.1E9) that can
> be represented by an integer variable. However, there is no warning
> message.
> I noticed this problem when I wanted to calculate the average coverage
> of a sequencing project accross the human genome. I used the coverage()
> method and then concatenated all chromosomes. This should give me an Rle
> vector of length ~3*109, but mean() does not work on that vector.
>
> Best,
> Hans-Ulrich
>
>
> > sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.12.3
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list