[Rd] How to handle INT8 data
Willem Ligtenberg
willem at wligtenberg.nl
Fri Jan 20 20:28:32 CET 2017
You might want to use a data.table then.
It will automatically detect that it is a 64 bit int.
Although also in that case the user will have to install the data.table
package.
(which is a good idea anyway in my opinion :) )
It will then obviously allow you to join tables.
Willem
On 20-01-17 18:47, Nicolas Paris wrote:
> Well I definitely cannot use them as numeric because join is the main
> reason of those identifiers.
>
> About int64 and bit64 packages, it's not a solution, because I am
> releasing a dataset for external users. I cannot ask them to install a
> package in order to exploit them.
>
> I have to be very carefull when releasing the data. If a user just use
> read.csv functions, they by default cast the identifiers as numeric.
>
> $ more res.csv
> "col1";"col2"
> "-1311071933951566764";"toto"
> "-1311071933951566764";"tata"
>
>
>> read.table("res.csv",sep=";",header=T)
> col1 col2
> 1 -1.311072e+18 toto
> 2 -1.311072e+18 tata
>
>> sapply(read.table("res.csv",sep=";",header=T),class)
> col1 col2
> "numeric" "factor"
>
>> read.table("res.csv",sep=";",header=T,colClasses="character")
> col1 col2
> 1 -1311071933951566764 toto
> 2 -1311071933951566764 tata
>
> Am I comdemned to provide a R script with the data in order to exploit the dataset ?
>
> Le 20 janv. 2017 à 18h29, Murray Stokely écrivait :
>> 2^53 == 2^53+1
>> TRUE
>>
>> Which makes joining or grouping data sets with 64 bit identifiers problematic.
>>
>> Murray (mobile)
>>
>> On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris at aphp.fr> wrote:
>>
>> Le 20 janv. 2017 à 18h09, Murray Stokely écrivait :
>> > The lack of 64 bit integer support causes lots of problems when dealing
>> with
>> > certain types of data where the loss of precision from coercing to 53
>> bits with
>> > double is unacceptable.
>>
>> Hello Murray,
>> Do you mean, by eg. -1311071933951566764 loses in precision during
>> as.numeric(-1311071933951566764) process ?
>> Thanks,
>> >
>> > Two packages were developed to deal with this: int64 and bit64.
>> >
>> > You may need to find archival versions of these packages if they've
>> fallen off
>> > cran.
>> >
>> > Murray (mobile phone)
>> >
>> > On Jan 20, 2017 7:20 AM, "Gabriel Becker" <gmbecker at ucdavis.edu> wrote:
>> >
>> > I am not on R-core, so cannot speak to future plans to internally
>> support
>> > int8 (though my impression is that there aren't any, at least none
>> that are
>> > close to fruition).
>> >
>> > The standard way of dealing with whole numbers too big to fit in an
>> integer
>> > is to put them in a numeric (double down in C land). this can
>> represent
>> > integers up to 2^53 without loss of precision see (
>> > http://stackoverflow.com/questions/1848700/biggest-
>> > integer-that-can-be-stored-in-a-double).
>> > This is how long vector indices are (currently) implemented in R. If
>> it's
>> > good enough for indices it's probably good enough for whatever you
>> need
>> > them for.
>> >
>> > Hope that helps.
>> >
>> > ~G
>> >
>> >
>> > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <nicolas.paris at aphp.fr
>> >
>> > wrote:
>> >
>> > > Hello r users,
>> > >
>> > > I have to deal with int8 data with R. AFAIK R does only handle
>> int4
>> > > with `as.integer` function [1]. I wonder:
>> > > 1. what is the better approach to handle int8 ? `as.character` ?
>> > > `as.numeric` ?
>> > > 2. is there any plan to handle int8 in the future ? As you might
>> know,
>> > > int4 is to small to deal with earth population right now.
>> > >
>> > > Thanks for you ideas,
>> > >
>> > > int8 eg:
>> > >
>> > > human_id
>> > > ----------------------
>> > > -1311071933951566764
>> > > -4708675461424073238
>> > > -6865005668390999818
>> > > 5578000650960353108
>> > > -3219674686933841021
>> > > -6469229889308771589
>> > > -606871692563545028
>> > > -8199987422425699249
>> > > -463287495999648233
>> > > 7675955260644241951
>> > >
>> > > reference:
>> > > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>> > >
>> > > --
>> > > Nicolas PARIS
>> > >
>> > > ______________________________________________
>> > > R-devel at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > >
>> >
>> >
>> >
>> > --
>> > Gabriel Becker, PhD
>> > Associate Scientist (Bioinformatics)
>> > Genentech Research
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> >
>>
>> --
>> Nicolas PARIS
>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170120/41f83b05/attachment.bin>
More information about the R-devel
mailing list