[Rd] Please make Pre-3.1 read.csv (type.convert) behavior available

Sat Apr 26 23:18:59 CEST 2014

On 26/04/2014, 4:12 PM, Tom Kraljevic wrote:
>
> Hi,
>
>
> One additional follow-up here.
>
> Unfortunately, I hit what looks like an R parsing bug that makes the Java Double.toHexString() output
> unreliable for reading by R.  (This is really unfortunate, because the format is intended to be lossless
> and it looks like it’s so close to fully working.)
>
> You can see the spec for the conversion here:
>      http://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toHexString(double)
>
> The last value in the list below is not parsed by R in the way I expected, and causes the column to flip
> from numeric to factor.
>
>
> -0x1.8ff831c7ffffdp-1
> -0x1.aff831c7ffffdp-1
> -0x1.bff831c7ffffdp-1
> -0x1.cff831c7ffffdp-1
> -0x1.dff831c7ffffdp-1
> -0x1.eff831c7ffffdp-1
> -0x1.fff831c7ffffdp-1           <<<<< this value is not parsed as a number and flips the column from numeric to factor.

That looks like a bug in the conversion code.  It uses the same test for 
lack of accuracy for hex doubles as it uses for decimal ones, but hex 
doubles can be larger before they lose precision.  I believe the largest 
integer that can be represented exactly is 2^53 - 1, i.e.

0x1.fffffffffffffp52

in this notation; can you confirm that your Java code reads it and 
writes the same string?  This is about 1% bigger than the limit at which 
type.convert switches to strings or factors.

Duncan Murdoch
>
>
> Below is the R output from adding one row at a time to “bad.csv”.
> The last attempt results in a factor rather than a numeric column.
>
> What’s really odd about it is that the .a through .e case work fine but the .f case doesn’t.
>
>
> Thanks,
> Tom
>
>
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	1 obs. of  1 variable:
>   $ V1: num -0.781
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	2 obs. of  1 variable:
>   $ V1: num  -0.781 -0.844
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	3 obs. of  1 variable:
>   $ V1: num  -0.781 -0.844 -0.875
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	4 obs. of  1 variable:
>   $ V1: num  -0.781 -0.844 -0.875 -0.906
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	5 obs. of  1 variable:
>   $ V1: num  -0.781 -0.844 -0.875 -0.906 -0.937
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	6 obs. of  1 variable:
>   $ V1: num  -0.781 -0.844 -0.875 -0.906 -0.937 ...
>> bad.df = read.csv(file="/Users/tomk/bad.csv", header=F)
>> str(bad.df)
> 'data.frame':	7 obs. of  1 variable:
>   $ V1: Factor w/ 7 levels "-0x1.8ff831c7ffffdp-1",..: 1 2 3 4 5 6 7
>
>