[Rd] type.convert and doubles
Martin Maechler
maechler at stat.math.ethz.ch
Tue Apr 29 10:58:14 CEST 2014
>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>> on Tue, 29 Apr 2014 09:32:21 +0200 writes:
> On 28 Apr 2014, at 19:17 , Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>
> [...snip...]
>>>> I think there should be two separate discussions:
>>
>>>> a) have an option (argument to type.convert and possibly
>>>> read.table) to enable/disable this behavior. I'm strongly
>>>> in favor of this.
>>
>>> In my (not committed) version of R-devel, I now have
>>
>>>> str(type.convert(format(1/3, digits=17), exact=TRUE))
>>> Factor w/ 1 level "0.33333333333333331": 1
>>>> str(type.convert(format(1/3, digits=17), exact=FALSE))
>>> num 0.333
>>
>>> where the 'exact' argument name has been ``imported'' from
>>> the underlying C code.
>>
>>> [ As we CRAN package writers know by now, arguments
>>> nowadays can hardly be abbreviated anymore, and so I am
>>> not open to longer alternative argument names, as someone
>>> liking blind typing, I'm not fond of camel case or other
>>> keyboard gymnastics (;-) but if someone has a great idea
>>> for a better argument name.... ]
>>
>>> Instead of only TRUE/FALSE, we could consider NA with
>>> semantics "FALSE + warning" or also "TRUE + warning".
>>
>>
>>>> b) decide what the default for a) will be. I have no
>>>> strong opinion, I can see arguments in both directions
>>
>>> I think many have seen the good arguments in both
>>> directions. I'm still strongly advocating that we value
>>> long term stability higher here, and revert to more
>>> compatibility with the many years of previous versions.
>>
>>> If we'd use a default of 'exact=NA', I'd like it to mean
>>> FALSE + warning, but would not oppose much to TRUE +
>>> warning.
>>
>> I have now committed svn rev 65507 --- to R-devel only for now ---
>> the above: exact = NA is the default
>> and it means "warning + FALSE".
>>
>> Interestingly, I currently get 5 identical warnings for one
>> simple call, so there seems clearly room for optimization, and
>> that is one main reason for this reason to not yet be migrated
>> to 'R 3.1.0 patched'.
> I actually think that the default should be the old behaviour. No warning, just potentially lose digits. If this gets a user in trouble, _then_ turn on the check for lost digits.
> After all, I think we had about one single use case, where lost digits caused trouble (I cannot even dig up what the case was - someone had, like, 20-digit ID labels, I reckon). In contrast, we have seen umpteen cases where people have exported floating point data to slightly beyond machine precision, "just in case", and relied on read.table() to do the sensible thing.
> It's also an open question whether we really want to apply the same logic to doubles and integer inputs.
a really good point. From my cursory code reading it would not
look so obvious where to make the distinction without quite a
bit of more coding, but I may just have overlooked a good idea.
> The whole change went in as (r62327)
> "force type.convert to read e.g. 64-bit integers as strings/factors"
> I, for one, did not expect that "e.g." would include 0.12345678901234567. My eyes were on the upcoming 3.0.0 release at that point, so I might not have noticed it anyway, but apparently noone lifted an eyebrow. It seems that this was deliberately postponed for 3.1.0, but for more than a year, noone actually exercised the code.
> -pd
> BTW, "exact" is a horrible name for an option, how about digitloss=c("allow", "warn", "forbid")?
I've also thought quickly about switching to an "enumeration
type" with string options.
If we would distinguish integer and non-integer input (and
hexadecimal vs decimal which are already different code branches),
we would need more than three options anyway ...
and when I start thinking about the possibilities, I start to
see too many "desirable" possibilities, e.g.,
digitloss="allow for non-integers, don't warn"
digitloss="allow for non-integers, do warn"
digitloss="forbid, don't warn"
digitloss="forbid, do warn"
etc... which would speak for a different approach, maybe with
yet another argument for dealing with "long integer" only.
OTOH, I don't feel like spending even considerably more time on
this, now, unless others are willing to also help (coding + testing).
Martin
More information about the R-devel
mailing list