[Rd] Parallel compression support for saving to rds/rdata files?
Simon Urbanek
simon.urbanek at r-project.org
Thu Dec 15 16:43:12 CET 2016
> On Dec 15, 2016, at 12:08 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:
>
> Hi,
>
> I have tried to follow the instructions in the ``save`` documentation and
> it doesn't seem to work (see below):
>
> mydata <- do.call(rbind, rep(iris, 10000))
> con <- pipe("pigz -p8 > fname.gz", "wb");
> save(mydata, file = con); close(con) # This runs
>
> R.utils::gunzip("fname.gz", "fname.RData", overwrite = TRUE)
> load("fname.RData") # Error: error reading from connection
>
> First question: Should the above work?
>
Not really, gzip is a bad example, because it doesn't really support parallel compression (since a gzip stream cannot be chopped into blocks by design), but you can do it with bzip2:
mydata <- do.call(rbind, rep(iris, 10000))
con <- pipe("pbzip2 -p8 > fname.bz2", "wb")
save(mydata, file = con)
close(con)
load("fname.bz2")
you can also use parallel read:
load(pipe("pbzip2 -dc fname.bz2"))
Cheers,
Simon
> Second question: Is it possible to make this dummy friendly by allowing
> "pigz" as an option for ``compress`` in saveRDS and save? And in such a way
> that the decompressing is hidden from the user like normal?
>
> Thanks!
> Kenny
>
>
> --
> Kendon Bell
> Email: kmb56 at berkeley.edu
> Phone: (510) 612-3375
>
> Ph.D. Candidate
> Department of Agricultural & Resource Economics
> University of California, Berkeley
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list