[Rd] type.convert and doubles

Fri Apr 18 00:51:53 CEST 2014

On 04/17/2014 02:21 PM, Murray Stokely wrote:
> On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert
> <Robert.McGehee at geodecapital.com> wrote:
>> Here's my use case: I have a function that pulls arbitrary
>> financial data from a web service call such as a stock's industry,
>> price, volume, etc. by reading the web output as a text table. The
>> data may be either character (industry, stock name, etc.) or
>> numeric (price, volume, etc.), and the function generally doesn't
>> know the class in advance. The problem is that we frequently get
>> numeric values represented with more precision than actually
>> exists, for instance a price of "2.6999999999999999" rather than
>> "2.70". The numeric representation is exactly one digit too much
>> for type.convert which (in R 3.10.0) converts it to character
>> instead of numeric (not what I want). This caused a bunch of
>> "non-numeric argument to binary operator" errors to appear today as
>> numeric data was now being represented as characters.
>>
>> I have no doubt that this probably will cause some unwanted RODBC
>> side effects for us as well. IMO, getting the class right is more
>> important than infinite precision. What use is a character
>> representation of a number anyway if you can't perform arithmetic
>> on it? I would favor at least making the new behavior optional, but
>> I think many packages (like RODBC) potentially need to be patched
>> to code around the new feature if it's left in.
>
> The uses of character representation of a number are many: unique
> identifiers/user ids, hash codes, timestamps, or other values where
> rounding results to the nearest value that can be represented as a
> numeric type would completely change the results of any data
> analysis performed on that data.
>
> Database join operations are certainly an area where R's previous
> behavior of silently dropping precision of numbers with type.convert
> can get you into trouble.  For example, things like join operations
> or group by operations performed in R code would produce erroneous
> results if you are joining/grouping by a key without the full
> precision of your underlying data.  Records can get joined up
> incorrectly or aggregated with the wrong groups.

I don't understand this. Assuming you are sending the SQL statement to 
the database engine, none of this erroneous matching is happening in R. 
The calculations all happens on the database.

But, for the case where the database does know that numbers are double 
precision, it would be nice if they got transmitted by ODBC to R as 
numerics (the usual translation) just as they are by the native 
interfaces like RPostgreSQL. Do you get the erroneous results when you 
use a native interface?

( from second response:)
> You want a casting operation in your SQL query or similar if you want
> a rounded type that will always fit in a double.  Cast or Convert
> operators in SQL, or similar for however you are getting the data you
> want to use with type.convert().  This is all application specific and
> sort of beyond the scope of type.convert(), which now behaves as it
> has been documented to behave.

This seems to suggests I need to use different SQL statements depending 
on which interface I use to talk to the database.

If you do 1/3 in a database calculation and that ends up being 
represented as something more accurate than double precision on the 
database, then it needs to be transmitted as something with higher 
precision (character/factor?). If the result is double precision it 
should be sent as double precision, not as something pretending to be 
more accurate.

I suspect the difficulty with ODBC may be that type.convert() really 
should not be called when both ends of the communication know that a 
double precision number is being exchanged.

Paul

> If you later want to do arithmetic on them, you can choose to lose
> precision by using as.numeric() or use one of the large number
> packages on CRAN (GMP, int64, bit64, etc.).  But once you've dropped
> the precision with as.numeric you can never get it back, which is
> why the previous behavior was clearly dangerous.
>
> I think I had some additional examples in the original bug/patch I
> filed about this issue a few years ago, but I'm unable to find it on
> bugs.r-project.org and its not referenced in the cl descriptions or
> news file.
>
> - Murray
>