[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)

Martin Becker martin.becker at mx.uni-saarland.de
Mon Oct 19 14:09:52 CEST 2009


Dear developers,

I am not really sure what causes the difference in the encoding of 
Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in 
R-2.10.0beta (now R-2.10.0rc), but I suppose that the different 
behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning 
regular expressions (RweaveLatexRuncode uses sub() in some places) as 
documented in NEWS.
AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and 
a (conditional) back-conversion after the sub()-commands seems to 
resolve the encoding problems (as well as the different behaviour of 
Rgui and Rterm in R-2.10.0rc).

It would be great if someone more involved in Sweave could take a look 
at (and maybe commit) the attached (untested!) patch (to r50160). Many 
thanks in advance!
Best wishes,

  Martin


Martin Becker wrote:
> Dear developers,
>
> I have come across a (somewhat strange) change in the encoding of 
> Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to 
> Rgui) on Windows installations. Of course, the NEWS file contains 
> quite a few changes concerning encoding, but I was not able to locate 
> an entry which explains the observed behaviour. I am not very familiar 
> with encodings/locales/codepages, but I will try to explain my 
> observations as best I can.
>
> In R-2.9.2pat, when invoking R via Rgui --vanilla (output of 
> seesionInfo() below), the output of Sweave for .rnw files containing 
> german umlaute (latin1-encoded) is again latin1-encoded (the resulting 
> .tex-file compiles with \usepackage[latin1]{inputenc} and 
> \usepackage[german]{babel}).
> In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output 
> of seesionInfo() below), some of Sweave's output (more precisely, 
> Soutput environments containing german umlaute, Sinput environments 
> with german umlaute are still latin1) is utf-8 encoded (with some 
> extra characters at the start and the end, which could be BOMs). 
> Surprisingly, when R is invoked from (Windows) command line (R 
> --vanilla or Rterm --vanilla), the encoding is completely latin1 again 
> (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of 
> Sweave's output seems to be specific to Rgui.
>
> Of course, I can work around this problem by using Rterm instead of 
> Rgui when Sweav'ing, but I am not sure if the current behaviour of R 
> via Rgui is as intended.
> I will try to attach the .rnw - file as well as the resulting .tex - 
> files (and hope, that the attachements pass through).
>
> Best wishes,
>
>   Martin
>
>
>
> sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
> R version 2.9.2 Patched (2009-09-24 r50041)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
> R version 2.10.0 beta (2009-10-11 r50037)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252  [3] 
> LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                  [5] 
> LC_TIME=German_Germany.1252  
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base 
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweave-patch.txt
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091019/6b93e60c/attachment.txt>


More information about the R-devel mailing list