[R] How to benchmark speed of load/readRDS correctly

Tue Aug 22 16:11:39 CEST 2017

Not convinced Jeff is completely right about this not concerning R, since I've found that the application language (R, 
perl, etc.) makes a difference in how files are accessed by/to OS. He is certainly correct that OS (and versions) are 
where the actual reading and writing happens, but sometimes the call to those can be inefficient. (Sorry, I've not got 
examples specifically for file reads, but had a case in computation where there was an 800% i.e., 80000 fold difference 
in timing with R, which rather took my breath away. That's probably been sorted now.) The difficulty in making general 
statements is that a rather full set of comparisons over different commands, datasets, OS and version variants is needed 
before the general picture can emerge. Using microbenchmark when you need to find the bottlenecks is how I'd proceed, 
which OP is doing.

About 30 years ago, I did write up some preliminary work, never published, on estimating the two halves of a copy, that 
is, the reading from file and storing to "memory" or a different storage location. This was via regression with a 
singular design matrix, but one can get a minimal length least squares solution via svd. Possibly relevant today to try 
to get at slow links on a network.

JN

On 2017-08-22 09:07 AM, Jeff Newmiller wrote:
> You need to study how reading files works in your operating system. This question is not about R.
>