[Bioc-devel] GenomicRanges: Storing 'seqlengths' as numeric

Hervé Pagès hpages at fhcrc.org
Tue Dec 3 21:41:54 CET 2013


Hi Kasper,

On 12/03/2013 12:25 PM, Kasper Daniel Hansen wrote:
> Is integer.max dependent on 32bit vs 64bit?

I don't think so. AFAIK integers are always 32-bit in R (at least on
Intel platforms), even on 64-bit OSes. So .Machine$integer.max is
always 2^31 - 1 (roughly 2 billions).

>  It seems to me that the OP
> specifically complains that he cannot represent 995*10^6 as an integer.

995*10^6 is roughly 1 billion so it can be represented as an integer,
except maybe on some exotic systems.

>   Also, is there a sign issue here as well?

Not that I know of.

H.

>
>
> On Tue, Dec 3, 2013 at 2:53 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi,
>
>     Agreed with Martin that until someone comes up with a chromosome that
>     is longer than .Machine$integer.max I don't see the need for switching
>     to double or int64 to represent the seqlengths.
>
>     Furthermore, since the seqlengths are used in many range operations
>     like checking the validity of the ranges in a GRanges object, trimming
>     them, computing coverage, handling circularity, etc... it would not
>     make much sense to make the switch for the seqlengths without also
>     making it for Ranges objects. That would be a serious undertaking though
>     and probably with many backward compatibility issues.
>
>     H.
>
>
>
>     On 12/03/2013 10:07 AM, Martin Morgan wrote:
>
>         On 12/03/2013 02:29 AM, Julian Gehring wrote:
>
>             Hi,
>
>             Some of the chromosomes out in the world are fairly large
>             (e.g. wheat
>             chr 3B
>             with > 995 Mbp [1]).  Currently, the 'seqlengths' of the
>             reference
>             sequence are
>             stored as 'integers' which do not allow to store lengths of this
>             size.  Are
>             there any plans of switching to 'doubles' or 64-bit integers
>             for the
>             'seqlengths' slot?  Or extending the slot such that a user
>             can store
>             it either
>             as integer or floating-point number?
>
>
>         But
>
>           > .Machine$integer.max
>         [1] 2147483647 <tel:%5B1%5D%202147483647>
>
>         so we at least survive wheat chr 3B?
>
>         If there is movement to support this I'd encourage exact
>         representation
>         as double (this is how R deals with long vectors, and I believe
>         it is
>         the javascript representation of integers so not completely
>         unprecedented) rather than 64 bit integers (which do not have any
>         support in R).
>
>         I guess this would be quite a big undertaking so real use cases
>         need to
>         be present. And support for larger integers would seem to be
>         useful to R
>         generally rather than just to Bioc.
>
>         Martin
>
>
>             Best wishes
>             Julian
>
>
>             [1] http://www.sciencemag.org/__content/322/5898/101
>             <http://www.sciencemag.org/content/322/5898/101>
>
>             _________________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             mailing list
>             https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>     _________________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list