[Rd] Decompressing raw vectors in memory
Duncan Temple Lang
duncan at wald.ucdavis.edu
Wed May 2 18:47:07 CEST 2012
I understand the desire not to have any dependency on additional
packages, and I have no desire to engage in any "mine's better" exchanges.
So write this just for the record.
The gzunzip() function handle this.
> library(RCurl); library(Rcompression)
> val = getURLContent("http://httpbin.org/gzip")
> cat(gunzip(val))
{
"origin": "24.5.119.171",
"headers": {
"Content-Length": "",
"Host": "httpbin.org",
"Content-Type": "",
"Connection": "keep-alive",
"Accept": "*/*"
},
"gzipped": true,
"method": "GET"
}
Just FWIW, as I really don't like writing to temporary files,
most so that we might move towards security in R.
D.
Hadley Wickham wrote:
> > Well, it seems what you get there depends on the client, but I did
> >
> > tystie% curl -o foo "http://httpbin.org/gzip"
> > tystie% file foo
> > foo: gzip compressed data, last modified: Wed May 2 17:06:24 2012, max
> > compression
> >
> > and the final part worried me: I do not know if memDecompress() knows about
> > that format. The help page does not claim it can do anything other than
> > de-compress the results of memCompress() (although past experience has shown
> > that it can in some cases). gzfile() supports a much wider range of
> > formats.
>
> Ah, ok. Thanks. Then in that case it's probably just as easy to save
> it to a temp file and read that.
>
> con <- file(tmp) # R automatically detects compression
> open(con, "rb")
> on.exit(close(con), TRUE)
>
> readBin(con, raw(), file.info(tmp)$size * 10)
>
> The only challenge is figuring out what n to give readBin. Is there a
> good general strategy for this? Guess based on the file size and then
> iterate until result of readBin has length less than n?
>
> n <- file.info(tmp)$size * 2
> content <- readBin(con, raw(), n)
> n_read <- length(content)
> while(n_read == n) {
> more <- readBin(con, raw(), n)
> content <- c(content, more)
> n_read <- length(more)
> }
>
> Which is not great style, but there shouldn't be many reads.
>
> Hadley
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120502/9fddbfc5/attachment.bin>
More information about the R-devel
mailing list