[R] How to benchmark speed of load/readRDS correctly

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Aug 22 18:33:09 CEST 2017


Caching happens, both within the operating system and within the C standard library. Ostensibly the intent for those caches is to help performance, but you are right that different low-level caching algorithms can be a poor match for specific application level use cases such as copying files or parsing text syntax. However, the OS and even the specific file system drivers (e.g. ext4 on flash disk or FAT32 on magnetic media) can behave quite differently for the same application level use case, so a generic discussion at the R language level (this mailing list) can be almost impossible to sort out intelligently. 
-- 
Sent from my phone. Please excuse my brevity.

On August 22, 2017 7:11:39 AM PDT, J C Nash <profjcnash at gmail.com> wrote:
>Not convinced Jeff is completely right about this not concerning R,
>since I've found that the application language (R, 
>perl, etc.) makes a difference in how files are accessed by/to OS. He
>is certainly correct that OS (and versions) are 
>where the actual reading and writing happens, but sometimes the call to
>those can be inefficient. (Sorry, I've not got 
>examples specifically for file reads, but had a case in computation
>where there was an 800% i.e., 80000 fold difference 
>in timing with R, which rather took my breath away. That's probably
>been sorted now.) The difficulty in making general 
>statements is that a rather full set of comparisons over different
>commands, datasets, OS and version variants is needed 
>before the general picture can emerge. Using microbenchmark when you
>need to find the bottlenecks is how I'd proceed, 
>which OP is doing.
>
>About 30 years ago, I did write up some preliminary work, never
>published, on estimating the two halves of a copy, that 
>is, the reading from file and storing to "memory" or a different
>storage location. This was via regression with a 
>singular design matrix, but one can get a minimal length least squares
>solution via svd. Possibly relevant today to try 
>to get at slow links on a network.
>
>JN
>
>On 2017-08-22 09:07 AM, Jeff Newmiller wrote:
>> You need to study how reading files works in your operating system.
>This question is not about R.
>> 



More information about the R-help mailing list