[R] How to benchmark speed of load/readRDS correctly

Wed Aug 23 14:40:20 CEST 2017

Hi there

Thanks for your answers. I didn't expect that this would be so complex. Honestly, I don't understand everything you wrote since I'm not an IT specialist. But I read something that reading *.rds files is faster than loading *.Rdata and I wanted to proof that for my system and R version. But thanks anyway for your time.

Cheers Raphael

> -----Ursprüngliche Nachricht-----
> Von: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us]
> Gesendet: Dienstag, 22. August 2017 18:33
> An: J C Nash <profjcnash at gmail.com>; r-help at r-project.org; Felber Raphael
> Agroscope <raphael.felber at agroscope.admin.ch>
> Betreff: Re: [R] How to benchmark speed of load/readRDS correctly
> 
> Caching happens, both within the operating system and within the C
> standard library. Ostensibly the intent for those caches is to help
> performance, but you are right that different low-level caching algorithms
> can be a poor match for specific application level use cases such as copying
> files or parsing text syntax. However, the OS and even the specific file
> system drivers (e.g. ext4 on flash disk or FAT32 on magnetic media) can
> behave quite differently for the same application level use case, so a generic
> discussion at the R language level (this mailing list) can be almost impossible
> to sort out intelligently.
> --
> Sent from my phone. Please excuse my brevity.
> 
> On August 22, 2017 7:11:39 AM PDT, J C Nash <profjcnash at gmail.com>
> wrote:
> >Not convinced Jeff is completely right about this not concerning R,
> >since I've found that the application language (R, perl, etc.) makes a
> >difference in how files are accessed by/to OS. He is certainly correct
> >that OS (and versions) are where the actual reading and writing
> >happens, but sometimes the call to those can be inefficient. (Sorry,
> >I've not got examples specifically for file reads, but had a case in
> >computation where there was an 800% i.e., 80000 fold difference in
> >timing with R, which rather took my breath away. That's probably been
> >sorted now.) The difficulty in making general statements is that a
> >rather full set of comparisons over different commands, datasets, OS
> >and version variants is needed before the general picture can emerge.
> >Using microbenchmark when you need to find the bottlenecks is how I'd
> >proceed, which OP is doing.
> >
> >About 30 years ago, I did write up some preliminary work, never
> >published, on estimating the two halves of a copy, that is, the reading
> >from file and storing to "memory" or a different storage location. This
> >was via regression with a singular design matrix, but one can get a
> >minimal length least squares solution via svd. Possibly relevant today
> >to try to get at slow links on a network.
> >
> >JN
> >
> >On 2017-08-22 09:07 AM, Jeff Newmiller wrote:
> >> You need to study how reading files works in your operating system.
> >This question is not about R.
> >>