[R] Encoding issue
Ivan Krylov
kry|ov@r00t @end|ng |rom gm@||@com
Mon Nov 5 20:34:02 CET 2018
On Mon, 5 Nov 2018 08:36:13 -0500 (EST)
Sebastien Bihorel <sebastien.bihorel using cognigencorp.com> wrote:
> [1] "râs"
Interesting. This is what I get if I decode the bytes 72 e2 80 99 73 0a
as latin-1 instead of UTF-8. They look like there is only three
characters, but, actually, there is more:
$ perl -CSD -Mcharnames=:full -MEncode=decode \
-E'for (split //, decode latin1 => pack "H*", "72e28099730a")
{ say ord, " ", $_, " ", charnames::viacode(ord) }'
114 r LATIN SMALL LETTER R
226 â LATIN SMALL LETTER A WITH CIRCUMFLEX
128 PADDING CHARACTER
153 SINGLE GRAPHIC CHARACTER INTRODUCER
115 s LATIN SMALL LETTER S
10
LINE FEED
Does it help if you explicitly specify the file encoding by passing
fileEncoding="UTF-8" argument to scan()?
--
Best regards,
Ivan
More information about the R-help
mailing list