[Rd] Memory allocation in read.table
Simon Urbanek
simon.urbanek at r-project.org
Wed Aug 28 20:52:45 CEST 2013
On Aug 28, 2013, at 2:24 PM, Hadley Wickham wrote:
>> Yup - parsing is the most expensive part. That's why for high-throughput data you don't want to use ASCII representation. It's amazing that the disk speeds are now so high that CPUs are the bottlenecks now, not vice versa.
>
> Do you have any recommendations for binary formats? For R, is there anything obviously better than Rdata?
>
native formats are the fastest (and versatile), so
readBin/writeBin or mmap
I tend to avoid strings (I use dates as POSIXct which are doubles and for anything else factors - which are integers) so the above works for me just fine.
I am working on a way to do direct mmap serialization of SEXPs but it's not ready yet (basic vectors are supported but complex objects not yet).
Cheers,
Simon
More information about the R-devel
mailing list