[Rd] Bug in memDecompress()
Olaf Mersmann
olafm at kimberly.tako.de
Fri May 7 17:27:39 CEST 2010
Dear R developers,
I have discovered a bug in the implementation of lzma decompression in memDecompress(). It is only triggered if the uncompressed size of the content is more than 3 times as large as the compressed content. Here's a simple example to reproduce it:
n <- 200
char <- paste(replicate(n, "1234567890"), collapse="")
char.comp <- memCompress(char, type="xz")
char.dec <- memDecompress(char.comp, type="xz", asChar=TRUE)
nchar(char.dec) == nchar(char)
raw <- serialize(char, connection=NULL)
raw.comp <- memCompress(raw, type="xz")
raw.dec <- memDecompress(raw.comp, type="xz")
length(raw.dec) == length(raw)
char.uns <- unserialize(raw.dec)
The root cause seems to be, that lzma_code() will return LZMA_OK even if it could not decompress the whole content. In this case strm.avail_in will be greater than zero. The following patch changes the respective if statements:
http://www.statistik.tu-dortmund.de/~olafm/temp/memdecompress.patch
It also contains a small fix from the xz upstream for an uninitialized field in lzma_stream.
Cheers,
Olaf
More information about the R-devel
mailing list