[R-SIG-Mac] R 3.2.2 Hangs Reading Files in El Capitan [Solved]

Brandon Hurr bhive01 at gmail.com
Sat May 7 00:03:46 CEST 2016


Simon,

Absolutely was about RDS, but R is all about choices and the
underlying issue was time to read in data which fread and feather are
quite fast at. I assume when you say efficient you are referring to
disk space?

I put together a script to look at this further with and without
compression*. If speed is a priority over disk space then Feather and
data.table (CSV) are good options**. CSV is portable to any system and
feather can be used by python/Julia. RDS/RDA saves a lot of space and,
but are slower to write and read due to compression.

I hope that's helpful to those thinking about their priorities for
file IO in R.

Brandon

* http://rpubs.com/bhive01/fileioinr
**  writing a CSV with data.table is freaky fast if you can get OpenMP
working on your machine
https://github.com/Rdatatable/data.table/issues/1692 Reading that same
CSV is comparable to RDS.


On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
> Brandon,
> note that the post was about RDS which is more efficient than all the options you list (in particular when not compressed). General advice is to avoid strings. Numeric vectors are several orders of magnitude faster than strings to load/save.
> Cheers,
> Simon
>
>
>> On May 5, 2016, at 6:49 PM, Brandon Hurr <bhive01 at gmail.com> wrote:
>>
>> You might be interested in the speed wars that are happening in the
>> file reading/writing space currently.
>>
>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes
>> McKinney's Feather have made huge speed advances in reading/writing
>> large datasets from disks (mostly csv).
>>
>> Data Table fread()/fwrite():
>> https://github.com/Rdatatable/data.table
>> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files
>> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/
>>
>>
>> Feather read_feather()/write_feather()
>> https://github.com/wesm/feather
>>
>> I don't often have big datasets (10s of MBs) so I don't see the
>> benefits of these much, but you might.
>>
>> HTH,
>> B
>>
>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio
>> <charles.dimaggio at gmail.com> wrote:
>>> Been a while, but wanted to close the page on a previous post describing R hanging on readRDS() and load() for largish (say 500MB or larger) files. Tried again with recent release (3.3.0).  Am able to read in large files under El Cap.  While the file is reading in, I get a disconcerting spinning pinwheel of death and a check under Force Quit reports R is not responding.  But if I wait it out, it eventually reads in.  Odd.  But I can live with it.
>>>
>>> Cheers
>>>
>>> Charles
>>>
>>>
>>>
>>>
>>>
>>>
>>> Charles DiMaggio, PhD, MPH
>>> Professor of Surgery and Population Health
>>> Director of Injury Research
>>> Department of Surgery
>>> New York University School of Medicine
>>> 462 First Avenue, NBV 15
>>> New York, NY 10016-9196
>>> Charles.Dimaggio at nyumc.org
>>> Office: 212.263.3202
>>> Mobile: 516.308.6426
>>>
>>>
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>> _______________________________________________
>> R-SIG-Mac mailing list
>> R-SIG-Mac at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>



More information about the R-SIG-Mac mailing list