[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Martin Becker
martin.becker at mx.uni-saarland.de
Mon Oct 19 14:09:52 CEST 2009
Dear developers,
I am not really sure what causes the difference in the encoding of
Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in
R-2.10.0beta (now R-2.10.0rc), but I suppose that the different
behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning
regular expressions (RweaveLatexRuncode uses sub() in some places) as
documented in NEWS.
AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and
a (conditional) back-conversion after the sub()-commands seems to
resolve the encoding problems (as well as the different behaviour of
Rgui and Rterm in R-2.10.0rc).
It would be great if someone more involved in Sweave could take a look
at (and maybe commit) the attached (untested!) patch (to r50160). Many
thanks in advance!
Best wishes,
Martin
Martin Becker wrote:
> Dear developers,
>
> I have come across a (somewhat strange) change in the encoding of
> Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to
> Rgui) on Windows installations. Of course, the NEWS file contains
> quite a few changes concerning encoding, but I was not able to locate
> an entry which explains the observed behaviour. I am not very familiar
> with encodings/locales/codepages, but I will try to explain my
> observations as best I can.
>
> In R-2.9.2pat, when invoking R via Rgui --vanilla (output of
> seesionInfo() below), the output of Sweave for .rnw files containing
> german umlaute (latin1-encoded) is again latin1-encoded (the resulting
> .tex-file compiles with \usepackage[latin1]{inputenc} and
> \usepackage[german]{babel}).
> In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output
> of seesionInfo() below), some of Sweave's output (more precisely,
> Soutput environments containing german umlaute, Sinput environments
> with german umlaute are still latin1) is utf-8 encoded (with some
> extra characters at the start and the end, which could be BOMs).
> Surprisingly, when R is invoked from (Windows) command line (R
> --vanilla or Rterm --vanilla), the encoding is completely latin1 again
> (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of
> Sweave's output seems to be specific to Rgui.
>
> Of course, I can work around this problem by using Rterm instead of
> Rgui when Sweav'ing, but I am not sure if the current behaviour of R
> via Rgui is as intended.
> I will try to attach the .rnw - file as well as the resulting .tex -
> files (and hope, that the attachements pass through).
>
> Best wishes,
>
> Martin
>
>
>
> sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
> R version 2.9.2 Patched (2009-09-24 r50041)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
> R version 2.10.0 beta (2009-10-11 r50037)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3]
> LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5]
> LC_TIME=German_Germany.1252
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweave-patch.txt
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091019/6b93e60c/attachment.txt>
More information about the R-devel
mailing list