[R] compress data on read, decompress on write

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Fri Feb 29 19:04:01 CET 2008


Thanks, Greg. Yes, I'd store the compressed stuff as a raw data type.

Best,

R.

On Thu, Feb 28, 2008 at 11:54 PM, Gregory Warnes <gregory.warnes at mac.com> wrote:
>
>  You might look at storing the data using R's "raw" data type...
>
>  -G
>
>
>
>
>  On Feb 28, 2008, at 5:38PM , Ramon Diaz-Uriarte wrote:
>
>  > Dear Christos,
>  >
>  > Thanks for your reply. Actually, I should have been more careful with
>  > language: its not really a sparse matrix, but rather a ragged array
>  > that results from a more compact representation we though of for the
>  > hidden states in a Hidden Markov Model in many runs of MCMC. However,
>  > it might make sense for us to check sparseMatrix and see how its done
>  > there.
>  >
>  > Thanks,
>  >
>  > R
>  >
>  > On Thu, Feb 28, 2008 at 7:49 PM, Christos Hatzis
>  > <christos.hatzis at nuverabio.com> wrote:
>  >> Ramon,
>  >>
>  >>  If you are looking for a solution to your specific application
>  >> (as opposed
>  >>  to a general compression/ decompression mechanism), it might be
>  >> worth
>  >>  checking out the Matrix package, which has facilities for storing
>  >> and
>  >>  manipulating sparse matrices.  The sparseMatrix class stores
>  >> matrices in the
>  >>  triplet representation (i.e. only indices and values of the non-zero
>  >>  elements) and this affords great compression ratios, depending on
>  >> the size
>  >>  and degree of sparseness of the matrix.
>  >>
>  >>  -Christos
>  >>
>  >>
>  >>
>  >>> -----Original Message-----
>  >>> From: r-help-bounces at r-project.org
>  >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz-
>  >>> Uriarte
>  >>> Sent: Thursday, February 28, 2008 1:18 PM
>  >>> To: r-help at stat.math.ethz.ch
>  >>> Subject: [R] compress data on read, decompress on write
>  >>>
>  >>> Dear All,
>  >>>
>  >>> I'd like to be able to have R store (in a list component) a
>  >>> compressed data set, and then write it out uncompressed.
>  >>> gzcon and gzfile work in exactly the opposite direction. What
>  >>> would be a good way to handle this?
>  >>>
>  >>> Details:
>  >>> ----------
>  >>>
>  >>> We have a package that uses C; part of the C output is a
>  >>> large sparse matrix. This is never manipulated directly by R,
>  >>> but always by the C code. However, we need to store that data
>  >>> somewhere (inside an R
>  >>> object) for further calls to the functions in our package.
>  >>> We'd like to store that matrix as part of the R object (say,
>  >>> as an element of a list). Ideally, it would be stored in as
>  >>> compressed a way as possible.
>  >>> Then, when we need to use that information, it would be
>  >>> decompressed and passed to the C function.
>  >>>
>  >>> I guess one way to do it is to have C deal with the
>  >>> compression and uncompression (e.g., using zlib or the bzip2
>  >>> libraries) and then use readBin, etc, from R. But, if I can,
>  >>> I'd like to avoid our C code having to call zlib, etc, so as
>  >>> to make our package easily portable.
>  >>>
>  >>>
>  >>> Thanks,
>  >>>
>  >>> R.
>  >>>
>  >>> --
>  >>> Ramon Diaz-Uriarte
>  >>> Statistical Computing Team
>  >>> Structural Biology and Biocomputing Programme Spanish
>  >>> National Cancer Centre (CNIO) http://ligarto.org/rdiaz
>  >>>
>  >>> ______________________________________________
>  >>> R-help at r-project.org mailing list
>  >>> https://stat.ethz.ch/mailman/listinfo/r-help
>  >>> PLEASE do read the posting guide
>  >>> http://www.R-project.org/posting-guide.html
>  >>> and provide commented, minimal, self-contained, reproducible code.
>  >>>
>  >>>
>  >>
>  >>
>  >>
>  >
>  >
>  >
>  > --
>  > Ramon Diaz-Uriarte
>  > Statistical Computing Team
>  > Structural Biology and Biocomputing Programme
>  > Spanish National Cancer Centre (CNIO)
>  > http://ligarto.org/rdiaz
>  >
>  > ______________________________________________
>  > R-help at r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide http://www.R-project.org/posting-
>  > guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz



More information about the R-help mailing list