[R] compress data on read, decompress on write

Christos Hatzis christos.hatzis at nuverabio.com
Thu Feb 28 19:49:04 CET 2008


Ramon,

If you are looking for a solution to your specific application (as opposed
to a general compression/ decompression mechanism), it might be worth
checking out the Matrix package, which has facilities for storing and
manipulating sparse matrices.  The sparseMatrix class stores matrices in the
triplet representation (i.e. only indices and values of the non-zero
elements) and this affords great compression ratios, depending on the size
and degree of sparseness of the matrix.

-Christos 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz-Uriarte
> Sent: Thursday, February 28, 2008 1:18 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] compress data on read, decompress on write
> 
> Dear All,
> 
> I'd like to be able to have R store (in a list component) a 
> compressed data set, and then write it out uncompressed. 
> gzcon and gzfile work in exactly the opposite direction. What 
> would be a good way to handle this?
> 
> Details:
> ----------
> 
> We have a package that uses C; part of the C output is a 
> large sparse matrix. This is never manipulated directly by R, 
> but always by the C code. However, we need to store that data 
> somewhere (inside an R
> object) for further calls to the functions in our package. 
> We'd like to store that matrix as part of the R object (say, 
> as an element of a list). Ideally, it would be stored in as 
> compressed a way as possible.
> Then, when we need to use that information, it would be 
> decompressed and passed to the C function.
> 
> I guess one way to do it is to have C deal with the 
> compression and uncompression (e.g., using zlib or the bzip2 
> libraries) and then use readBin, etc, from R. But, if I can, 
> I'd like to avoid our C code having to call zlib, etc, so as 
> to make our package easily portable.
> 
> 
> Thanks,
> 
> R.
> 
> --
> Ramon Diaz-Uriarte
> Statistical Computing Team
> Structural Biology and Biocomputing Programme Spanish 
> National Cancer Centre (CNIO) http://ligarto.org/rdiaz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list