[Rd] R 2.5.0 refuses to print enough digits to recover exact floating point values
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed May 23 19:32:36 CEST 2007
I think this is a bug in the MacOS X runtime. I've checked the C99
standard, and can see no limits on the precision that you should be able
to specify to printf.
Not that some protection against such OSes would come amiss.
However, the C99 standard does make clear that [sf]printf is not required
(even as 'recommended practice') to be accurate to more than *_DIG places,
which as ?as.character has pointed out is 15 on the machines R runs on.
It really is the case that writing out a number to > 15 significant digits
and reading it back in again is not required to give exactly the same
IEC60559 (sic) number, and furthermore there are R platforms for which
this does not happen. What Mr Weinberg claimed is 'now impossible' never
was possible in general (and he seems to be berating the R developers for
not striving to do better than the C standard requires of OSes). In fact,
I believe this to be impossible unless you have access to extended
precsion arithmetic, as otherwise printf/scanf have to use the same
arithmetic as the computations.
This is why R supports transfer of floating-point numbers via readBin and
friends, and uses a binary format itself for save() (by default).
I should also say that any algorithm that depends on that level of details
in the numbers will depend on the compiler used and optimization level and
so on. Don't expect repeatability to that level even with binary data
unless you (are allowed to) never apply bugfixes to your system.
On Wed, 23 May 2007, hadley wickham wrote:
> On 5/23/07, hadley wickham <h.wickham at gmail.com> wrote:
>> On 5/22/07, Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote:
>>> Zack Weinberg wrote:
>>>> I have noticed that in R 2.5.0, no method of textual output will print
>>>> a "double" mode quantity with more than 15 digits after the decimal
>>>> point. From the help page (?print.default) it appears that this is
>>>> intentional, since digits after the fifteenth may be uncertain.
>>>> However, fifteen digits after the decimal point are not enough to
>>>> represent all the values that an IEEE-double can take. (You need one
>>>> more.) This means it is now impossible to write out data in textual
>>>> format (e.g. in order to manipulate it with another program) and read
>>>> back in exactly the same values. Some analyses are sensitive to this
>>>> sort of extra rounding, especially if it happens repeatedly.
>>>> I'd really appreciate some way of forcing R to print enough digits to
>>>> represent every possible IEEE double value. I would also argue that
>>>> this should be the default behavior of dump(), write.table() and
>>>> friends, and save(...,ascii=TRUE), to prevent data loss.
>>> formatC(exp(1), digits=100, width=-1)
>> formatC(exp(1), digits=1000000, width=-1)
>> *** caught bus error ***
>> address 0x2, cause 'non-existent physical address'
>> R version 2.5.0 (2007-04-23)
> Ooops, and the traceback:
> 1: .C("str_signif", x = x, n = n, mode = as.character(mode), width =
> as.integer(width), digits = as.integer(digits), format =
> as.character(format), flag = as.character(flag), result =
> blank.chars(i.strlen), PACKAGE = "base")
> 2: formatC(exp(1), digits = 1e+06, width = -1)
> R-devel at r-project.org mailing list
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel