[ESS] Re: [R] Strange characters in 2.1.0?

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 8 11:20:21 CEST 2005


>>>>> "PaCo" == Patrick Connolly <p.connolly at hortresearch.co.nz>
>>>>>     on Wed, 8 Jun 2005 11:31:44 +1200 writes:

    PaCo> On Tue, 07-Jun-2005 at 04:10PM +0200, Martin Maechler wrote:
    PaCo> |> >>>>> "Dan" == Dan Bolser <dmb at mrc-dunn.cam.ac.uk>

    PaCo> |>       ..........
    PaCo> |> 
    PaCo> |>     Dan> I have gone back to 2.0.0 :)
    PaCo> |> 
    PaCo> |> Don't do that!
    PaCo> |> You've lost tons of nice new features and gained quite an amount
    PaCo> |> of old bugs by downgrading .. 

    PaCo> I get the non-generic quotes to show on the screen, but they won't
    PaCo> print with enscript.  I end up with a lot of wrapped lines and
    PaCo> nonsense where an unknown character should be.

Why is this diverted from R- to ESS-help? 
Printing with enscript is also a topic for printing a
transcript 'output.Rout' resulting e.g. from R CMD BATCH input.R output.Rout
I'm  committing a cross-posting  felony now, by posting back
to R-help {and please drop ESS-help from "cc" when further replying}....

    PaCo> What do I need to do to get enscript to know about such characters?
    PaCo> There is an encoding parameter which defaults to latin1.  Should I
    PaCo> change that to something?

Yes, in principle.  "latin1" aka ISO-latin-1 aka iso-8859-1
is (for western European languages) the predecessor standard of
the new unicode standard where we use the UTF-8 encoding
{and the above is (too) much simplified; also enter  "locale"
settings and standards}

However, my version of enscript does not seem to support UTF-8 (yet).
Nor does 'a2ps' an alternative to enscript which does pretty
print R source files.

So there are basically two options :

1) Get rid of unicode / utf-8
   by setting the locale of your computer / login 
   to use the "old" locales, e.g. en_US instead of en_US.utf-8.
   This will be more or less fine for Emacs and R --- though in
   in our {Redhat Enterprise} setup, the X11-fonts for
   non-utf-locales are quite crippled compared to those for
   utf-8 ones.

   However, as more and more other utilities are based on utf-8
   encoded files, you will see funny characters there
   if you are using locales like "de_*" or "fr_*", at least,
   e.g. for man pages which are only in utf-8 for our Redhat OS setup.

2) Improve the printing tools by 
    a) filtering *.utf-8 to latin-* 
    b) printing the resulting latin-*

   For filtering, there are programs like  'recode' (was "GNU
   recode", now "Free recode") which are extremely flexible and
   'iconv' (less flexible but wider spread) that can translate
   utf-8 to and from all kind of encodings / character sets.

In the future, of course everything will work out of the box
when all the utilities in your computer will be aware of utf
encodings and will automatically send correct stuff to the
printer and display it correctly in all kind of viewers/editors... :-)   

Given my experiences during the last several months
(where I, e.g., also found that our oldish LaTeX setup 
  didn't yet accept  \usepackage[utf8]{inputencoding ),
If I were in New Zeeland and would not need accents or umlauts,
I'd probably stick with latin1  (and would make sure my X
server got proper non-utf8 fonts) for another year or so.

Martin




More information about the R-help mailing list