[R] Rcompression and Java Deflator

Stabler, Ben Stabler at pbworld.com
Wed May 6 01:41:20 CEST 2009

(this may be a duplicate post since I attached a file to the previous try...sorry about that)

Below are the first few lines of a zlib compressed byte array written from Java with the Deflator class.  

> readBin("row_1",raw(),10000000)
   [1] 4c 45 50 e2 49 d5 86 bc 48 a1 32 5d 49 9d f5 90 48 e0 14 33 49 8f 54 6a 49 77 c9 48 48 d9 ec 56 47 91 48 f0 47 25 56 ef 47 b8 f5 7b 46 35 25 00 47 73 11 c5 48 6c 8e b9 47 ca 71 92 46 8d dc aa 45 92 0e

I’m trying to read it into R with Rcompression and I can't get it to work.  I think it may be because Java’s Deflator class by default (see below ... the nowrap parameter) writes the data without the header and checksum.  I can't change the Java creation code.  I think uncompress() reads a zlib package (with headers) and gunzip() reads a gzip package (with headers).  Is there a way to read the package load without headers?  It is my understanding that the package load (minus the headers) is the same for gzip and zlib.  The Ruby thread at the bottom seems to be related.  Thanks for any help!

> compressedData = readBin("row_1",raw(),10000000)
> uncompress(compressedData)
Error in uncompress(compressedData) : corrupted compressed (gzip) source

> gunzip(compressedData)
Error in gunzip(compressedData) : 
  Failed to uncompress the raw data: (-3) incorrect header check


Java Deflater

public Deflater(int level, boolean nowrap) Creates a new compressor using the specified compression level. If 'nowrap' is true then the ZLIB header and checksum fields will not be used in order to support the compression format used in both GZIP and PKZIP.

level - the compression level (0-9)
nowrap - if true then use GZIP compatible compression

http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/Deflater.html#Deflater(int, boolean)


These threads also seem to be dealing with the same issue….



The Ruby thread says “As could be seen in your first post, you are using -MAX_WBITS, which enables old (headerless? don't know what it's called) zlib format, that has no gzip header and no checksum. Maybe you should be using +MAX_WBITS (the default), which adds necessary header and checksum.”

Ben Stabler
Systems Analysis Group
Parsons Brinckerhoff

NOTICE: This communication and any attachments ("this message") may contain confidential information for 
the sole use of the intended recipient(s). Any unauthorized use, disclosure, viewing, copying, alteration, 
dissemination or distribution of, or reliance on this message is strictly prohibited. If you have received this 
message in error, or you are not an authorized recipient, please notify the sender immediately by replying 
to this message, delete this message and all copies from your e-mail system and destroy any printed copies.

More information about the R-help mailing list