[Rd] Decompressing raw vectors in memory
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed May 2 18:16:29 CEST 2012
On 02/05/2012 16:43, Hadley Wickham wrote:
>>> I'm struggling to decompress a gzip'd raw vector in memory:
>>>
>>> content<- readBin("http://httpbin.org/gzip", "raw", 1000)
>>>
>>> memDecompress(content, type = "gzip")
>>> # Error in memDecompress(content, type = "gzip") :
>>> # internal error -3 in memDecompress(2)
>>>
>>> I'm reasonably certain that the file is correctly compressed, because
>>> if I save it out to a file, I can read the uncompressed data:
>>>
>>> tmp<- tempfile()
>>> writeBin(content, tmp)
>>> readLines(tmp)
>>>
>>> So that suggests I'm using memDecompress incorrectly. Any hints?
>>
>> Headers.
>
> Looking at http://tools.ietf.org/html/rfc1952:
>
> * the first two bytes are id1 and id2, which are 1f 8b as expected
>
> * the third byte is the compression: deflate (as.integer(content[3]))
>
> * the fourth byte is the flag
>
> rawToBits(content[4])
> [1] 00 00 00 00 00 00 00 00
>
> which indicates no extra header fields are present
>
> So the header looks ok to me (with my limited knowledge of gzip)
>
> Stripping off the header doesn't seem to help either:
>
> memDecompress(content[-(1:10)], type = "gzip")
> # Error in memDecompress(content[-(1:10)], type = "gzip") :
> # internal error -3 in memDecompress(2)
>
> I've read the help for memDecompress but I don't see anything there to help me.
>
> Any more hints?
Well, it seems what you get there depends on the client, but I did
tystie% curl -o foo "http://httpbin.org/gzip"
tystie% file foo
foo: gzip compressed data, last modified: Wed May 2 17:06:24 2012, max
compression
and the final part worried me: I do not know if memDecompress() knows
about that format. The help page does not claim it can do anything
other than de-compress the results of memCompress() (although past
experience has shown that it can in some cases). gzfile() supports a
much wider range of formats.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list