[R] unz() ignores encoding argument
Stefan Evert
stefanML at collocations.de
Mon Sep 20 15:39:18 CEST 2010
Hi!
I'm trying to read individual files from a ZIP archive, using the unz() function. Some of the files contain non-ASCII characters and I'd like to avoid unpacking them in a temporary directory.
My problem is that unz() seems to ignore the encoding="latin1" option I need to read the non-ASCII characters properly. I can't find a clear indication in the documentation that this is expected behaviour, except for the remark that "unz reads (only) single files within zip files, in binary mode" (and a short comment further below that re-encoding only works for text connections).
Digging a bit in the source code, the ultimate cause seems to be this line in the unz_open() C-level function, on line 359 of src/main/dounzip.c:
> /* set_iconv(); not yet */
Any ideas why this is commented out? The previous lines set up con->text appropriately and con->encname was set by do_unz(), so I don't see an obvious reason why the iconv layer can't be added.
I'm working on 2.11.1
> _
> platform i386-apple-darwin9.8.0
> arch i386
> os darwin9.8.0
> system i386, darwin9.8.0
> status
> major 2
> minor 11.1
> year 2010
> month 05
> day 31
> svn rev 52157
> language R
> version.string R version 2.11.1 (2010-05-31)
but have been looking at the current R-devel source code, so I suspect my problem won't just go away with the next release.
Best regards,
Stefan Evert
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]
More information about the R-help
mailing list