[R] strangely long floating point with write.table()

Mike Miller mbmiller+l at gmail.com
Mon Mar 17 23:22:26 CET 2014


On Mon, 17 Mar 2014, Berend Hasselman wrote:

> On 17-03-2014, at 21:03, Mike Miller <mbmiller+l at gmail.com> wrote:
>
>> …...
>> data[,c(5:9,11,13,17:21)] <- signif(data[,c(5:9,11,13,17:21)], digits=5)
>>
>> Then write.table(data) does what I'd want.  It works better than format(). Example:
>>
>>> data2 <- data
>>> data2[,c(5:9,11,13,17:21)] <- signif(data2[,c(5:9,11,13,17:21)], digits=5)
>>>
>>> write.table(format(data[1:10,], digits=5, trim=T), row.names=F, col.names=F, quote=F)
>> 3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.1320 1 0 TT 1 GA 0 0 2 0.000 0 0.000 0.00000
>> 3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1 GA 1 0 1 0.000 0 1.000 1.00000
>> 3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2 GG 1 0 1 0.000 0 1.000 1.00000
>> 3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.4470 1 0 TT 1 GA 0 0 2 0.000 1 1.000 1.00000
>> 3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1 GA 0 0 2 0.000 0 0.000 0.00000
>> 3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0 AA 2 0 0 0.000 0 2.000 4.00000
>> 3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0 AA 0 1 1 0.944 0 0.944 0.89114
>> 3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0 AA 0 0 2 0.000 1 1.000 1.00000
>> 5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.03952420 0.3057 1 2 CC 0 AA 2 0 0 0.000 0 2.000 4.00000
>> 5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0 AA 1 0 1 0.000 0 1.000 1.00000
>>>
>>> write.table(data2[1:10,], row.names=F, col.names=F, quote=F)
>> 3100674 303164 6 1 -0.11869 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 GA 0 0 2 0 0 0 0
>> 3100765 303321 6 1 0.014344 -0.013654 -0.0017613 0.085027 1.0365 1 1 CT 1 GA 1 0 1 0 0 1 1
>> 3101201 304352 6 1 -0.017105 -0.016957 0.032039 0.008849 0.4279 1 1 CT 2 GG 1 0 1 0 0 1 1
>> 3101862 305250 6 1 -0.013283 0.010848 -0.017008 -0.036924 -0.447 1 0 TT 1 GA 0 0 2 0 1 1 1
>> 3103579 305847 6 1 0.015939 0.0096043 -0.04379 -0.022247 -0.3365 1 0 TT 1 GA 0 0 2 0 0 0 0
>> 3103645 305961 6 1 0.20441 -0.10901 0.27271 -0.2989 1.5818 1 2 CC 0 AA 2 0 0 0 0 2 4
>> 3104098 308536 6 1 0.028421 0.056281 -0.071545 -0.11511 0.9974 1 0 TT 0 AA 0 1 1 0.944 0 0.944 0.89114
>> 3104361 306928 6 1 -0.048404 0.026672 -0.054875 -0.036405 0.4499 1 0 TT 0 AA 0 0 2 0 1 1 1
>> 5100094 503136 6 1 0.19703 -0.41046 0.086957 -0.039524 0.3057 1 2 CC 0 AA 2 0 0 0 0 2 4
>> 5100938 503615 6 1 0.00098838 0.026718 0.04513 0.047903 -0.1743 2 1 CT 0 AA 1 0 1 0 0 1 1
>>
>> format() with digits=5 is still showing 7 significant digits.  Why? signif() only shows 5.
>
>
> From the help of format:
>
> digits "how many significant digits are to be used for numeric and 
> complex x. The default, NULL, uses getOption("digits"). This is a 
> suggestion: enough decimal places will be used so that the smallest (in 
> magnitude) number has this many significant digits, and also to satisfy 
> nsmall. (For the interpretation for complex numbers see signif.)”
>
> So if I read this correctly the smallest number will have 5 significant 
> digits. Larger numbers may get more. Given the fixed width (see argument 
> trim).


Thanks!  Another thing I've figured out:  Use of "drop0trailing=T" in 
format() fixes the .00000 stuff that I didn't like:

> write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 GA 0 0 2 0 0 0 0
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1 GA 1 0 1 0 0 1 1
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2 GG 1 0 1 0 0 1 1
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.447 1 0 TT 1 GA 0 0 2 0 1 1 1
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1 GA 0 0 2 0 0 0 0
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0 AA 2 0 0 0 0 2 4
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0 AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0 AA 0 0 2 0 1 1 1
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.0395242 0.3057 1 2 CC 0 AA 2 0 0 0 0 2 4
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0 AA 1 0 1 0 0 1 1

That's pretty close to the signif() output I was getting (above) but with 
a few digits added because of the small numbers (as you explained).

format() with trim=T seems to just delete the spaces that format() would 
have added for column alignment.  It doesn't seem to affect the number of 
digits displayed.

I still have more to figure out, but for most smaller table-writing jobs, 
I think something like the last command above will be my usual approach. 
In real life, I would use a tab delimiter, though.

I'm still unsure about the best way for dealing with very large data 
frames, though.  There's probably a good way to stream data into a file so 
that it doesn't have to be written as an additional large object in 
memory.  There must be a way to make a connection and then just pipe the 
formatted data into it.  Maybe something related to sprintf() will work.

Mike


More information about the R-help mailing list