[R] strangely long floating point with write.table()

Duncan Murdoch murdoch.duncan at gmail.com
Tue Mar 18 11:55:46 CET 2014


On 14-03-17 8:43 PM, Mike Miller wrote:
> On Mon, 17 Mar 2014, Duncan Murdoch wrote:
>
>> On 14-03-17 6:22 PM, Mike Miller wrote:
>>
>>> Thanks!  Another thing I've figured out:  Use of "drop0trailing=T" in
>>> format() fixes the .00000 stuff that I didn't like:
>>>
>>> write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), row.names=F, col.names=F, quote=F)
> [snip]
>>>
>>> I still have more to figure out, but for most smaller table-writing
>>> jobs, I think something like the last command above will be my usual
>>> approach. In real life, I would use a tab delimiter, though.
>>>
>>> I'm still unsure about the best way for dealing with very large data
>>> frames, though.  There's probably a good way to stream data into a file
>>> so that it doesn't have to be written as an additional large object in
>>> memory.  There must be a way to make a connection and then just pipe
>>> the formatted data into it.  Maybe something related to sprintf() will
>>> work.
>>
>> You've never explained why you want to write these gigantic text files.
>> Text is a lossy way to store numbers:  it takes 15 bytes to store about
>> 8 bytes of information, and you'll probably lose a few bits at the end.
>> Why not write your files in binary, storing exactly what you have in
>> memory?  It'll be a lot faster to write and to read, you won't need to
>> duplicated before writing, etc.
>
>
> Thanks for asking, Duncan.  A typical problem is that I am running 12
> processes at once on a 12-core machine with 32 GB of RAM, so each process
> has to be limited to about 2.5 GB total.  Then I try to load as much data
> as I can within that limitation.  The output data does not always need to
> be in text format, but it usually does because it has to be read by other
> programs.

Other programs are unlikely to be able to read save() files, but they 
should be able to read the output of writeBin.  Not all programs can do 
it easily, e.g. I wouldn't want to try to do that in Excel (though I 
think you can using VBA), but most should be able to.

The main reasons to use text files are so that humans can read the 
output or so that you can keep it for a long time and not worry about 
losing the documentation of the internal format; neither of those seems 
to apply to your use case.  Binary files are better for interprocess 
communication, because you skip two conversion steps.

Duncan Murdoch

>
> I was hoping I could read a line from a data frame and format it like
> this:
>
>> sprintf(c(rep("%s",2), rep("%d",2), rep("%.4f",4)), data[1,1:8])
>
> But sprintf reads vectors, so they have to be of a single type.
>
> Thanks for your help.
>
> Mike
>




More information about the R-help mailing list