[Rd] Decompressing raw vectors in memory

Duncan Temple Lang duncan at wald.ucdavis.edu
Wed May 2 18:47:07 CEST 2012


I understand the desire not to have any dependency on additional
packages, and I have no desire to engage in any "mine's better" exchanges.
So write this just for the record. 
The gzunzip() function handle this.

> library(RCurl); library(Rcompression)
> val = getURLContent("http://httpbin.org/gzip")
> cat(gunzip(val))
{
  "origin": "24.5.119.171",
  "headers": {
    "Content-Length": "",
    "Host": "httpbin.org",
    "Content-Type": "",
    "Connection": "keep-alive",
    "Accept": "*/*"
  },
  "gzipped": true,
  "method": "GET"
}


Just FWIW, as I really don't like writing to temporary files,
most so that we might move towards security in R.

   D.


Hadley Wickham wrote:
> > Well, it seems what you get there depends on the client, but I did
> >
> > tystie% curl -o foo "http://httpbin.org/gzip"
> > tystie% file foo
> > foo: gzip compressed data, last modified: Wed May  2 17:06:24 2012, max
> > compression
> >
> > and the final part worried me: I do not know if memDecompress() knows about
> > that format.  The help page does not claim it can do anything other than
> > de-compress the results of memCompress() (although past experience has shown
> > that it can in some cases).  gzfile() supports a much wider range of
> > formats.
> 
> Ah, ok.  Thanks.  Then in that case it's probably just as easy to save
> it to a temp file and read that.
> 
>   con <- file(tmp) # R automatically detects compression
>   open(con, "rb")
>   on.exit(close(con), TRUE)
> 
>   readBin(con, raw(), file.info(tmp)$size * 10)
> 
> The only challenge is figuring out what n to give readBin. Is there a
> good general strategy for this?  Guess based on the file size and then
> iterate until result of readBin has length less than n?
> 
>   n <- file.info(tmp)$size * 2
>   content <- readBin(con, raw(),  n)
>   n_read <- length(content)
>   while(n_read == n) {
>     more <- readBin(con, raw(),  n)
>     content <- c(content, more)
>     n_read <- length(more)
>   }
> 
> Which is not great style, but there shouldn't be many reads.
> 
> Hadley
> 
> 
> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120502/9fddbfc5/attachment.bin>


More information about the R-devel mailing list