[Rd] How to handle INT8 data
Nicolas Paris
nicolas.paris at aphp.fr
Fri Jan 20 18:47:55 CET 2017
Well I definitely cannot use them as numeric because join is the main
reason of those identifiers.
About int64 and bit64 packages, it's not a solution, because I am
releasing a dataset for external users. I cannot ask them to install a
package in order to exploit them.
I have to be very carefull when releasing the data. If a user just use
read.csv functions, they by default cast the identifiers as numeric.
$ more res.csv
"col1";"col2"
"-1311071933951566764";"toto"
"-1311071933951566764";"tata"
> read.table("res.csv",sep=";",header=T)
col1 col2
1 -1.311072e+18 toto
2 -1.311072e+18 tata
>sapply(read.table("res.csv",sep=";",header=T),class)
col1 col2
"numeric" "factor"
> read.table("res.csv",sep=";",header=T,colClasses="character")
col1 col2
1 -1311071933951566764 toto
2 -1311071933951566764 tata
Am I comdemned to provide a R script with the data in order to exploit the dataset ?
Le 20 janv. 2017 à 18h29, Murray Stokely écrivait :
> 2^53 == 2^53+1
> TRUE
>
> Which makes joining or grouping data sets with 64 bit identifiers problematic.
>
> Murray (mobile)
>
> On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris at aphp.fr> wrote:
>
> Le 20 janv. 2017 à 18h09, Murray Stokely écrivait :
> > The lack of 64 bit integer support causes lots of problems when dealing
> with
> > certain types of data where the loss of precision from coercing to 53
> bits with
> > double is unacceptable.
>
> Hello Murray,
> Do you mean, by eg. -1311071933951566764 loses in precision during
> as.numeric(-1311071933951566764) process ?
> Thanks,
> >
> > Two packages were developed to deal with this: int64 and bit64.
> >
> > You may need to find archival versions of these packages if they've
> fallen off
> > cran.
> >
> > Murray (mobile phone)
> >
> > On Jan 20, 2017 7:20 AM, "Gabriel Becker" <gmbecker at ucdavis.edu> wrote:
> >
> > I am not on R-core, so cannot speak to future plans to internally
> support
> > int8 (though my impression is that there aren't any, at least none
> that are
> > close to fruition).
> >
> > The standard way of dealing with whole numbers too big to fit in an
> integer
> > is to put them in a numeric (double down in C land). this can
> represent
> > integers up to 2^53 without loss of precision see (
> > http://stackoverflow.com/questions/1848700/biggest-
> > integer-that-can-be-stored-in-a-double).
> > This is how long vector indices are (currently) implemented in R. If
> it's
> > good enough for indices it's probably good enough for whatever you
> need
> > them for.
> >
> > Hope that helps.
> >
> > ~G
> >
> >
> > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <nicolas.paris at aphp.fr
> >
> > wrote:
> >
> > > Hello r users,
> > >
> > > I have to deal with int8 data with R. AFAIK R does only handle
> int4
> > > with `as.integer` function [1]. I wonder:
> > > 1. what is the better approach to handle int8 ? `as.character` ?
> > > `as.numeric` ?
> > > 2. is there any plan to handle int8 in the future ? As you might
> know,
> > > int4 is to small to deal with earth population right now.
> > >
> > > Thanks for you ideas,
> > >
> > > int8 eg:
> > >
> > > human_id
> > > ----------------------
> > > -1311071933951566764
> > > -4708675461424073238
> > > -6865005668390999818
> > > 5578000650960353108
> > > -3219674686933841021
> > > -6469229889308771589
> > > -606871692563545028
> > > -8199987422425699249
> > > -463287495999648233
> > > 7675955260644241951
> > >
> > > reference:
> > > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
> > >
> > > --
> > > Nicolas PARIS
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
> >
> > --
> > Gabriel Becker, PhD
> > Associate Scientist (Bioinformatics)
> > Genentech Research
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> --
> Nicolas PARIS
>
>
--
Nicolas PARIS
More information about the R-devel
mailing list