[R-sig-DB] data type mapping for RMySQL

Hadley Wickham h@w|ckh@m @end|ng |rom gm@||@com
Fri Dec 20 19:04:21 CET 2013


I don't think there's anyway to preserve precision here, but the R
client should at least give a warning that this is happening.

Hadley

On Fri, Dec 20, 2013 at 7:38 AM, jim holtman <jholtman using gmail.com> wrote:
> There is no direct mapping for a 'bigint' that has 20 digits.  The
> floating point in R (and in any other language used on your computer
> with IEEE floating point) is limited to about 15 digits.  In the cases
> where I have had to process this type of numeric data, I have had it
> read in as a character value.  This means that I cannot do arithmetic
> on it, but at least I can compare it to other values correctly.  This
> is a variation of FAQ 7.31.
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Fri, Dec 20, 2013 at 10:28 AM, Paul Gilbert <pgilbert902 using gmail.com> wrote:
>> Just for the record, this also happens in Postgresql, so the problem is more
>> general. (I tried testing in SQLite but my standalone client seems to be
>> encrypting the db and RSQLite cannot use it.) I'm not sure there is a
>> solution to this as I do not immediately see how to represent bigint in R.
>>
>> Generic SQL example:
>>
>>>mysql test
>>>psql  test
>>>sqlite test
>>
>> [ DROP TABLE tester; ]
>>
>> CREATE TABLE  tester (
>>       id bigint NOT NULL,
>>       address bigint NOT NULL
>>     ) ;
>>
>> INSERT INTO tester VALUES (1, 2029716610205351937);
>>
>> SELECT * FROM tester ;
>>
>> gives
>>   1 | 2029716610205351937
>> in mysql, psql, and sqlite
>>
>> ( exit; /q or .quit )
>> (later DROP TABLE tester;)
>>
>>
>>>R
>>
>> require("RMySQL")
>> hubdb <- dbConnect("MySQL", dbname="test" )
>> print(fetch(dbSendQuery(hubdb, "select * from tester;"), n=-1), digits=20)
>>
>> RMySQL gives
>>   id             address
>> 1  1 2029716610205351936
>>
>>
>> require("RPostgreSQL")
>> hubdb <- dbConnect("PostgreSQL", dbname="test" )
>> print(fetch(dbSendQuery(hubdb, "select * from tester;"), n=-1), digits=20)
>> RPostgreSQL gives
>>   id             address
>> 1  1 2029716610205351936
>>
>>
>> Paul
>>
>>
>>
>> On 13-12-20 08:07 AM, Tim Coote wrote:
>>>
>>> Hullo
>>>
>>> I’m not sure whether this is a bug, but I would expect some sort of
>>> warning where mappings may lose precision.
>>>
>>> I’ve got a mysql table with large numbers in it:
>>>
>>> CREATE TABLE if not exists `tester` (
>>>        `id` bigint(20) NOT NULL AUTO_INCREMENT,
>>>        `address` bigint(20) NOT NULL,
>>>        PRIMARY KEY (`id`),
>>>        UNIQUE KEY `idx_address` (`address`)
>>>      ) ENGINE=InnoDB AUTO_INCREMENT=78628 DEFAULT CHARSET=latin1 ;
>>>
>>> The key aspect of this is that the ‘address’ column needs to be used to
>>> distinguish between values. If I put one value into this table, and read it
>>> back with the mysql client:
>>>
>>> mysql> \. test.sql
>>> Query OK, 1 row affected, 1 warning (0.00 sec)
>>>
>>> Database changed
>>> Query OK, 0 rows affected (0.05 sec)
>>>
>>> Query OK, 1 row affected (0.01 sec)
>>>
>>> +---------------------+
>>> | address             |
>>> +---------------------+
>>> | 2029716610205351937 |
>>> +---------------------+
>>> 1 row in set (0.00 sec)
>>>
>>> But if I try to read the value using R:
>>>>
>>>> library(RMySQL)
>>>
>>> Loading required package: DBI
>>>>
>>>> options(digits=20)
>>>>
>>>> hubdb <- dbConnect("MySQL", username="root", host="localhost",
>>>> password="dummy", dbname="dummy" )
>>>> q = sprintf("select * from tester;")
>>>>
>>>>
>>>> testvals <- fetch(dbSendQuery(hubdb, q), n=-1)
>>>>
>>>> print (testvals)
>>>
>>>                address
>>> 1 2029716610205351936
>>>>
>>>>
>>>>
>>> Note that the address value is different
>>>
>>> The issue is that the precision of the sql type (20 decimal characters) is
>>> larger than the precision of the R type that it’s been matched to.  I don’t
>>> know whether this is a bug or not - it would have help to have a warning
>>> about the possible precision mismatch, rather than to spot the emerging bug
>>> ;-) - that aside, is there a simple way to map the bigint type to something
>>> else, eg for this situation, character would do. I know that I can cast the
>>> type in the sql query, but I’d prefer to set the mapping at a more global
>>> scope if I could in the data definition.
>>>
>>> tia
>>>
>>> Tim
>>> _______________________________________________
>>> R-sig-DB mailing list -- R Special Interest Group
>>> R-sig-DB using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>>>
>>
>> _______________________________________________
>> R-sig-DB mailing list -- R Special Interest Group
>> R-sig-DB using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db



-- 
http://had.co.nz/




More information about the R-sig-DB mailing list