[Rd] type.convert and doubles

Duncan Murdoch murdoch.duncan at gmail.com
Sun Apr 20 21:28:59 CEST 2014


On 20/04/2014, 2:22 PM, Gábor Csárdi wrote:
> How about using the quoting to decide what should be character, and what
> not? You do not need to quote numbers, logical values, only characters, so
> this would make sense imo.

That explicitly violates some of the CSV "standards".  The quotes must 
have no effect on the interpretation.

Duncan Murdoch

>
> How about something like this:
> - if it is quoted (and not specified otherwise in colClasses), then it is a
> character/factor
> - if it is not quoted (and not specified otherwise in colClasses), then the
> type is automatically detected, according to the pre-3.1.x method, and a
> (suppressible) warning or error is given if information is lost, when
> coercing to numbers.
>
> Just an idea.
>
> Gabor
>
> On Sun, Apr 20, 2014 at 3:24 AM, Murray Stokely <murray at stokely.org> wrote:
>
>> Yes, I'm also strongly in favor of having an option for this.  If
>> there was an option in base R for controlling this we would just use
>> that and get rid of the separate RProtoBuf.int64AsString option we use
>> in the RProtoBuf package on CRAN to control whether 64-bit int types
>> from C++ are returned to R as numerics or character vectors.
>>
>> I agree that reasonable people can disagree about the default, but I
>> found my original bug report about this, so I will counter Robert's
>> example with my favorite example of what was wrong with the previous
>> behavior :
>>
>> tmp<-data.frame(n=c("72057594037927936", "72057594037927937"),
>> name=c("foo", "bar"))
>> length(unique(tmp$n))
>> # 2
>> write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE)
>> data <- read.csv("/tmp/foo.csv")
>> length(unique(data$n))
>> # 1
>>
>>            - Murray
>>
>>
>> On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek
>> <simon.urbanek at r-project.org> wrote:
>>> On Apr 19, 2014, at 9:00 AM, Martin Maechler <maechler at stat.math.ethz.ch>
>> wrote:
>>>
>>>>>>>>> McGehee, Robert <Robert.McGehee at geodecapital.com>
>>>>>>>>>     on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>>>>
>>>>>> This is all application specific and
>>>>>> sort of beyond the scope of type.convert(), which now behaves as it
>>>>>> has been documented to behave.
>>>>
>>>>> That's only a true statement because the documentation was changed to
>> reflect the new behavior! The new feature in type.convert certainly does
>> not behave according to the documentation as of R 3.0.3. Here's a snippit:
>>>>
>>>>> The first type that can accept all the
>>>>> non-missing values is chosen (numeric and complex return values
>>>>> will represented approximately, of course).
>>>>
>>>>> The key phrase is in parentheses, which reminds the user to expect a
>> possible loss of precision. That important parenthetical was removed from
>> the documentation in R 3.1.0 (among other changes).
>>>>
>>>>> Putting aside the fact that this introduces a large amount of
>> unnecessary work rewriting SQL / data import code, SQL packages, my biggest
>> conceptual problem is that I can no longer rely on a particular function
>> call returning a particular class. In my example querying stock prices,
>> about 5% of prices came back as factors and the remaining 95% as numeric,
>> so we had random errors popping in throughout the morning.
>>>>
>>>>> Here's a short example showing us how the new behavior can be
>> unreliable. I pass a character representation of a uniformly distributed
>> random variable to type.convert. 90% of the time it is converted to
>> "numeric" and 10% it is a "factor" (in R 3.1.0). In the 10% of cases in
>> which type.convert converts to a factor the leading non-zero digit is
>> always a 9. So if you were expecting a numeric value, then 1 in 10 times
>> you may have a bug in your code that didn't exist before.
>>>>
>>>>>> options(digits=16)
>>>>>> cl <- NULL; for (i in 1:10000) cl[i] <-
>> class(type.convert(format(runif(1))))
>>>>>> table(cl)
>>>>> cl
>>>>> factor numeric
>>>>> 990    9010
>>>>
>>>> Yes.
>>>>
>>>> Murray's point is valid, too.
>>>>
>>>> But in my view, with the reasoning we have seen here,
>>>> *and* with the well known software design principle of
>>>> "least surprise" in mind,
>>>> I also do think that the default for type.convert() should be what
>>>> it has been for > 10 years now.
>>>>
>>>
>>> I think there should be two separate discussions:
>>>
>>> a) have an option (argument to type.convert and possibly read.table) to
>> enable/disable this behavior. I'm strongly in favor of this.
>>>
>>> b) decide what the default for a) will be. I have no strong opinion, I
>> can see arguments in both directions
>>>
>>> But most importantly I think a) is better than the status quo - even if
>> the discussion about b) drags out.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list