[Rd] type.convert and doubles
Duncan Murdoch
murdoch.duncan at gmail.com
Sun Apr 27 18:26:31 CEST 2014
On 27/04/2014, 10:16 AM, Hadley Wickham wrote:
> Is there a reason it's a factor and not a string? A string would seem to be
> more appropriate to me (given that we know it's a number that can't be
> represented exactly by R)
The user asked that anything which can't be converted to a number should
be converted to a factor.
Yes, that's a bad default, but some people rely on it.
Duncan Murdoch
>
> Hadley
>
> On Saturday, April 26, 2014, Martin Maechler <maechler at stat.math.ethz.ch>
> wrote:
>
>>>>>>> Simon Urbanek <simon.urbanek at r-project.org <javascript:;>>
>>>>>>> on Sat, 19 Apr 2014 13:06:15 -0400 writes:
>>
>> > On Apr 19, 2014, at 9:00 AM, Martin Maechler <
>> maechler at stat.math.ethz.ch <javascript:;>> wrote:
>> >>>>>>> McGehee, Robert <Robert.McGehee at geodecapital.com<javascript:;>
>>>
>> >>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>> >>
>> >>>> This is all application specific and
>> >>>> sort of beyond the scope of type.convert(), which now behaves as
>> it
>> >>>> has been documented to behave.
>> >>
>> >>> That's only a true statement because the documentation was changed
>> to reflect the new behavior! The new feature in type.convert certainly does
>> not behave according to the documentation as of R 3.0.3. Here's a snippit:
>> >>
>> >>> The first type that can accept all the
>> >>> non-missing values is chosen (numeric and complex return values
>> >>> will represented approximately, of course).
>> >>
>> >>> The key phrase is in parentheses, which reminds the user to expect
>> a possible loss of precision. That important parenthetical was removed from
>> the documentation in R 3.1.0 (among other changes).
>> >>
>> >>> Putting aside the fact that this introduces a large amount of
>> unnecessary work rewriting SQL / data import code, SQL packages, my biggest
>> conceptual problem is that I can no longer rely on a particular function
>> call returning a particular class. In my example querying stock prices,
>> about 5% of prices came back as factors and the remaining 95% as numeric,
>> so we had random errors popping in throughout the morning.
>> >>
>> >>> Here's a short example showing us how the new behavior can be
>> unreliable. I pass a character representation of a uniformly distributed
>> random variable to type.convert. 90% of the time it is converted to
>> "numeric" and 10% it is a "factor" (in R 3.1.0). In the 10% of cases in
>> which type.convert converts to a factor the leading non-zero digit is
>> always a 9. So if you were expecting a numeric value, then 1 in 10 times
>> you may have a bug in your code that didn't exist before.
>> >>
>> >>>> options(digits=16)
>> >>>> cl <- NULL; for (i in 1:10000) cl[i] <-
>> class(type.convert(format(runif(1))))
>> >>>> table(cl)
>> >>> cl
>> >>> factor numeric
>> >>> 990 9010
>> >>
>> >> Yes.
>> >>
>> >> Murray's point is valid, too.
>> >>
>> >> But in my view, with the reasoning we have seen here,
>> >> *and* with the well known software design principle of
>> >> "least surprise" in mind,
>> >> I also do think that the default for type.convert() should be what
>> >> it has been for > 10 years now.
>> >>
>>
>> > I think there should be two separate discussions:
>>
>> > a) have an option (argument to type.convert and possibly read.table)
>> to enable/disable this behavior. I'm strongly in favor of this.
>>
>> In my (not committed) version of R-devel, I now have
>>
>> > str(type.convert(format(1/3, digits=17), exact=TRUE))
>> Factor w/ 1 level "0.33333333333333331": 1
>> > str(type.convert(format(1/3, digits=17), exact=FALSE))
>> num 0.333
>>
>> where the 'exact' argument name has been ``imported'' from the
>> underlying C code.
>>
>> [ As we CRAN package writers know by now, arguments nowadays can
>> hardly be abbreviated anymore, and so I am not open to longer
>> alternative argument names, as someone liking blind typing, I'm
>> not fond of camel case or other keyboard gymnastics (;-) but if someone
>> has a great idea for
>> a better argument name.... ]
>>
>> Instead of only TRUE/FALSE, we could consider NA with
>> semantics "FALSE + warning" or also "TRUE + warning".
>>
>>
>> > b) decide what the default for a) will be. I have no strong opinion,
>> I can see arguments in both directions
>>
>> I think many have seen the good arguments in both directions.
>> I'm still strongly advocating that we value long term stability
>> higher here, and revert to more compatibility with the many
>> years of previous versions.
>>
>> If we'd use a default of 'exact=NA', I'd like it to mean
>> FALSE + warning, but would not oppose much to TRUE + warning.
>>
>> I agree that for the TRUE case, it may make more sense to return
>> string-like object of a new (simple) class such as "bignum"
>> that was mentioned in this thread.
>>
>> OTOH, this functionality should make it into an R 3.1.1 in the
>> not so distant future, and thinking through consequences and
>> implementing the new class approach may just take a tad too much
>> time...
>>
>> Martin
>>
>> > But most importantly I think a) is better than the status quo - even
>> if the discussion about b) drags out.
>>
>> > Cheers,
>> > Simon
>>
>> ______________________________________________
>> R-devel at r-project.org <javascript:;> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
More information about the R-devel
mailing list