[Bioc-devel] GenomicRanges: Storing 'seqlengths' as numeric

Hervé Pagès hpages at fhcrc.org
Tue Dec 3 20:53:16 CET 2013


Hi,

Agreed with Martin that until someone comes up with a chromosome that
is longer than .Machine$integer.max I don't see the need for switching
to double or int64 to represent the seqlengths.

Furthermore, since the seqlengths are used in many range operations
like checking the validity of the ranges in a GRanges object, trimming
them, computing coverage, handling circularity, etc... it would not
make much sense to make the switch for the seqlengths without also
making it for Ranges objects. That would be a serious undertaking though
and probably with many backward compatibility issues.

H.


On 12/03/2013 10:07 AM, Martin Morgan wrote:
> On 12/03/2013 02:29 AM, Julian Gehring wrote:
>> Hi,
>>
>> Some of the chromosomes out in the world are fairly large (e.g. wheat
>> chr 3B
>> with > 995 Mbp [1]).  Currently, the 'seqlengths' of the reference
>> sequence are
>> stored as 'integers' which do not allow to store lengths of this
>> size.  Are
>> there any plans of switching to 'doubles' or 64-bit integers for the
>> 'seqlengths' slot?  Or extending the slot such that a user can store
>> it either
>> as integer or floating-point number?
>
> But
>
>  > .Machine$integer.max
> [1] 2147483647
>
> so we at least survive wheat chr 3B?
>
> If there is movement to support this I'd encourage exact representation
> as double (this is how R deals with long vectors, and I believe it is
> the javascript representation of integers so not completely
> unprecedented) rather than 64 bit integers (which do not have any
> support in R).
>
> I guess this would be quite a big undertaking so real use cases need to
> be present. And support for larger integers would seem to be useful to R
> generally rather than just to Bioc.
>
> Martin
>
>>
>> Best wishes
>> Julian
>>
>>
>> [1] http://www.sciencemag.org/content/322/5898/101
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list