[R-SIG-Mac] R 3.2.2 Hangs Reading Files in El Capitan [Solved]

Charles DiMaggio charles.dimaggio at gmail.com
Sat May 7 15:59:16 CEST 2016


Glad the post elicited some discussion.  Haven’t played with feather.  I’ve used data.table and it is indeed appreciably faster than base approaches for getting big csv’s into R.  I also find dplyr (with say MonetDB) to be a solution for out-of-memory approaches to large data sets. But, for native R files, I’ve found RDS to be fastest.  


Cheers

Charles






On May 6, 2016, at 9:01 PM, Simon Urbanek <simon.urbanek at r-project.org> wrote:

> 
> On May 6, 2016, at 6:03 PM, Brandon Hurr <bhive01 at gmail.com> wrote:
> 
>> Simon,
>> 
>> Absolutely was about RDS, but R is all about choices and the
>> underlying issue was time to read in data which fread and feather are
>> quite fast at. I assume when you say efficient you are referring to
>> disk space?
>> 
> 
> No, parsing data is always slower than native formats. Really fastest is readBin (and similar direct I/O approaches), followed by feather and RDS (the only reason RDS is not the fastest is that there is an extra copy in-memory) -- unless you have slow disk, of course.
> 
> 
>> I put together a script to look at this further with and without
>> compression*. If speed is a priority over disk space then Feather and
>> data.table (CSV) are good options**. CSV is portable to any system and
>> feather can be used by python/Julia. RDS/RDA saves a lot of space and,
>> but are slower to write and read due to compression.
>> 
> 
> That's why I said uncompressed RDS [compress=FALSE] - you compress only if you want to save space, not speed :).
> 
> FWIW according to our benchmarks iotools is the fastest for reading CSV if you want to get into that arena, but that's whole another story - my point was that the question was NOT about CSV or anything parsed - and neither about writing - which is why this is getting really OT.
> 
> Cheers,
> Simon
> 
> 
> 
>> I hope that's helpful to those thinking about their priorities for
>> file IO in R.
>> 
>> Brandon
>> 
>> * http://rpubs.com/bhive01/fileioinr
>> **  writing a CSV with data.table is freaky fast if you can get OpenMP
>> working on your machine
>> https://github.com/Rdatatable/data.table/issues/1692 Reading that same
>> CSV is comparable to RDS.
>> 
>> 
>> On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
>> <simon.urbanek at r-project.org> wrote:
>>> Brandon,
>>> note that the post was about RDS which is more efficient than all the options you list (in particular when not compressed). General advice is to avoid strings. Numeric vectors are several orders of magnitude faster than strings to load/save.
>>> Cheers,
>>> Simon
>>> 
>>> 
>>>> On May 5, 2016, at 6:49 PM, Brandon Hurr <bhive01 at gmail.com> wrote:
>>>> 
>>>> You might be interested in the speed wars that are happening in the
>>>> file reading/writing space currently.
>>>> 
>>>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes
>>>> McKinney's Feather have made huge speed advances in reading/writing
>>>> large datasets from disks (mostly csv).
>>>> 
>>>> Data Table fread()/fwrite():
>>>> https://github.com/Rdatatable/data.table
>>>> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files
>>>> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/
>>>> 
>>>> 
>>>> Feather read_feather()/write_feather()
>>>> https://github.com/wesm/feather
>>>> 
>>>> I don't often have big datasets (10s of MBs) so I don't see the
>>>> benefits of these much, but you might.
>>>> 
>>>> HTH,
>>>> B
>>>> 
>>>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio
>>>> <charles.dimaggio at gmail.com> wrote:
>>>>> Been a while, but wanted to close the page on a previous post describing R hanging on readRDS() and load() for largish (say 500MB or larger) files. Tried again with recent release (3.3.0).  Am able to read in large files under El Cap.  While the file is reading in, I get a disconcerting spinning pinwheel of death and a check under Force Quit reports R is not responding.  But if I wait it out, it eventually reads in.  Odd.  But I can live with it.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Charles
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Charles DiMaggio, PhD, MPH
>>>>> Professor of Surgery and Population Health
>>>>> Director of Injury Research
>>>>> Department of Surgery
>>>>> New York University School of Medicine
>>>>> 462 First Avenue, NBV 15
>>>>> New York, NY 10016-9196
>>>>> Charles.Dimaggio at nyumc.org
>>>>> Office: 212.263.3202
>>>>> Mobile: 516.308.6426
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>      [[alternative HTML version deleted]]
>>>>> 
>>>>> _______________________________________________
>>>>> R-SIG-Mac mailing list
>>>>> R-SIG-Mac at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>> 
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>> 
>>> 
>> 
> 


	[[alternative HTML version deleted]]



More information about the R-SIG-Mac mailing list