[Rd] type.convert and doubles
murray at stokely.org
Thu Apr 17 20:21:46 CEST 2014
On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
> Here's my use case: I have a function that pulls arbitrary financial data from a web service call such as a stock's industry, price, volume, etc. by reading the web output as a text table. The data may be either character (industry, stock name, etc.) or numeric (price, volume, etc.), and the function generally doesn't know the class in advance. The problem is that we frequently get numeric values represented with more precision than actually exists, for instance a price of "2.6999999999999999" rather than "2.70". The numeric representation is exactly one digit too much for type.convert which (in R 3.10.0) converts it to character instead of numeric (not what I want). This caused a bunch of "non-numeric argument to binary operator" errors to appear today as numeric data was now being represented as characters.
> I have no doubt that this probably will cause some unwanted RODBC side effects for us as well. IMO, getting the class right is more important than infinite precision. What use is a character representation of a number anyway if you can't perform arithmetic on it? I would favor at least making the new behavior optional, but I think many packages (like RODBC) potentially need to be patched to code around the new feature if it's left in.
The uses of character representation of a number are many: unique
identifiers/user ids, hash codes, timestamps, or other values where
rounding results to the nearest value that can be represented as a
numeric type would completely change the results of any data analysis
performed on that data.
Database join operations are certainly an area where R's previous
behavior of silently dropping precision of numbers with type.convert
can get you into trouble. For example, things like join operations or
group by operations performed in R code would produce erroneous
results if you are joining/grouping by a key without the full
precision of your underlying data. Records can get joined up
incorrectly or aggregated with the wrong groups.
If you later want to do arithmetic on them, you can choose to lose
precision by using as.numeric() or use one of the large number
packages on CRAN (GMP, int64, bit64, etc.). But once you've dropped
the precision with as.numeric you can never get it back, which is why
the previous behavior was clearly dangerous.
I think I had some additional examples in the original bug/patch I
filed about this issue a few years ago, but I'm unable to find it on
bugs.r-project.org and its not referenced in the cl descriptions or
More information about the R-devel