[R] How to benchmark speed of load/readRDS correctly
J C Nash
profjcnash at gmail.com
Tue Aug 22 16:11:39 CEST 2017
Not convinced Jeff is completely right about this not concerning R, since I've found that the application language (R,
perl, etc.) makes a difference in how files are accessed by/to OS. He is certainly correct that OS (and versions) are
where the actual reading and writing happens, but sometimes the call to those can be inefficient. (Sorry, I've not got
examples specifically for file reads, but had a case in computation where there was an 800% i.e., 80000 fold difference
in timing with R, which rather took my breath away. That's probably been sorted now.) The difficulty in making general
statements is that a rather full set of comparisons over different commands, datasets, OS and version variants is needed
before the general picture can emerge. Using microbenchmark when you need to find the bottlenecks is how I'd proceed,
which OP is doing.
About 30 years ago, I did write up some preliminary work, never published, on estimating the two halves of a copy, that
is, the reading from file and storing to "memory" or a different storage location. This was via regression with a
singular design matrix, but one can get a minimal length least squares solution via svd. Possibly relevant today to try
to get at slow links on a network.
JN
On 2017-08-22 09:07 AM, Jeff Newmiller wrote:
> You need to study how reading files works in your operating system. This question is not about R.
>
More information about the R-help
mailing list