[Bioc-devel] GenomicRanges: Storing 'seqlengths' as numeric
Hervé Pagès
hpages at fhcrc.org
Tue Dec 3 21:41:54 CET 2013
Hi Kasper,
On 12/03/2013 12:25 PM, Kasper Daniel Hansen wrote:
> Is integer.max dependent on 32bit vs 64bit?
I don't think so. AFAIK integers are always 32-bit in R (at least on
Intel platforms), even on 64-bit OSes. So .Machine$integer.max is
always 2^31 - 1 (roughly 2 billions).
> It seems to me that the OP
> specifically complains that he cannot represent 995*10^6 as an integer.
995*10^6 is roughly 1 billion so it can be represented as an integer,
except maybe on some exotic systems.
> Also, is there a sign issue here as well?
Not that I know of.
H.
>
>
> On Tue, Dec 3, 2013 at 2:53 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi,
>
> Agreed with Martin that until someone comes up with a chromosome that
> is longer than .Machine$integer.max I don't see the need for switching
> to double or int64 to represent the seqlengths.
>
> Furthermore, since the seqlengths are used in many range operations
> like checking the validity of the ranges in a GRanges object, trimming
> them, computing coverage, handling circularity, etc... it would not
> make much sense to make the switch for the seqlengths without also
> making it for Ranges objects. That would be a serious undertaking though
> and probably with many backward compatibility issues.
>
> H.
>
>
>
> On 12/03/2013 10:07 AM, Martin Morgan wrote:
>
> On 12/03/2013 02:29 AM, Julian Gehring wrote:
>
> Hi,
>
> Some of the chromosomes out in the world are fairly large
> (e.g. wheat
> chr 3B
> with > 995 Mbp [1]). Currently, the 'seqlengths' of the
> reference
> sequence are
> stored as 'integers' which do not allow to store lengths of this
> size. Are
> there any plans of switching to 'doubles' or 64-bit integers
> for the
> 'seqlengths' slot? Or extending the slot such that a user
> can store
> it either
> as integer or floating-point number?
>
>
> But
>
> > .Machine$integer.max
> [1] 2147483647 <tel:%5B1%5D%202147483647>
>
> so we at least survive wheat chr 3B?
>
> If there is movement to support this I'd encourage exact
> representation
> as double (this is how R deals with long vectors, and I believe
> it is
> the javascript representation of integers so not completely
> unprecedented) rather than 64 bit integers (which do not have any
> support in R).
>
> I guess this would be quite a big undertaking so real use cases
> need to
> be present. And support for larger integers would seem to be
> useful to R
> generally rather than just to Bioc.
>
> Martin
>
>
> Best wishes
> Julian
>
>
> [1] http://www.sciencemag.org/__content/322/5898/101
> <http://www.sciencemag.org/content/322/5898/101>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list