[R] Rcompression and Java Deflator

Andrew Dunn adunn at mango-solutions.com
Wed May 6 12:34:25 CEST 2009

Hi Ben,

I have successfully used rcompression with archives created in Java. I have not used the Deflater to do it but rather the GZipOutputStream. (http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/GZIPOutputStream.html) in combination with a java based tar archive generator (http://www.trustice.com/java/tar/) to create gzipped tar files. The trustice tar api is very similar to the java ZIP api and is fairly simple to pick up.




-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Stabler, Ben
Sent: 06 May 2009 00:41
To: r-help at r-project.org
Subject: [R] Rcompression and Java Deflator

(this may be a duplicate post since I attached a file to the previous try...sorry about that)

Below are the first few lines of a zlib compressed byte array written from Java with the Deflator class.  

> readBin("row_1",raw(),10000000)
   [1] 4c 45 50 e2 49 d5 86 bc 48 a1 32 5d 49 9d f5 90 48 e0 14 33 49 8f 54 6a 49 77 c9 48 48 d9 ec 56 47 91 48 f0 47 25 56 ef 47 b8 f5 7b 46 35 25 00 47 73 11 c5 48 6c 8e b9 47 ca 71 92 46 8d dc aa 45 92 0e

I'm trying to read it into R with Rcompression and I can't get it to work.  I think it may be because Java's Deflator class by default (see below ... the nowrap parameter) writes the data without the header and checksum.  I can't change the Java creation code.  I think uncompress() reads a zlib package (with headers) and gunzip() reads a gzip package (with headers).  Is there a way to read the package load without headers?  It is my understanding that the package load (minus the headers) is the same for gzip and zlib.  The Ruby thread at the bottom seems to be related.  Thanks for any help!

> compressedData = readBin("row_1",raw(),10000000)
> uncompress(compressedData)
Error in uncompress(compressedData) : corrupted compressed (gzip) source

> gunzip(compressedData)
Error in gunzip(compressedData) : 
  Failed to uncompress the raw data: (-3) incorrect header check


Java Deflater

public Deflater(int level, boolean nowrap) Creates a new compressor using the specified compression level. If 'nowrap' is true then the ZLIB header and checksum fields will not be used in order to support the compression format used in both GZIP and PKZIP.

level - the compression level (0-9)
nowrap - if true then use GZIP compatible compression

http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/Deflater.html#Deflater(int, boolean)


These threads also seem to be dealing with the same issue....



The Ruby thread says "As could be seen in your first post, you are using -MAX_WBITS, which enables old (headerless? don't know what it's called) zlib format, that has no gzip header and no checksum. Maybe you should be using +MAX_WBITS (the default), which adds necessary header and checksum."

Ben Stabler
Systems Analysis Group
Parsons Brinckerhoff

NOTICE: This communication and any attachments ("this message") may contain confidential information for 
the sole use of the intended recipient(s). Any unauthorized use, disclosure, viewing, copying, alteration, 
dissemination or distribution of, or reliance on this message is strictly prohibited. If you have received this 
message in error, or you are not an authorized recipient, please notify the sender immediately by replying 
to this message, delete this message and all copies from your e-mail system and destroy any printed copies.
R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list