[R] about opening R script Chinese annotation garble problem

Tue May 24 23:42:41 CEST 2022

On 4/25/22 19:52, Bill Dunlap wrote:
> If your file is encoded as UTF-8 (as most stuff on the internet is, there
> will be no null bytes in the file), then R-4.2.0 on a recent enough version
> of Windows can source() it without mentioning the encoding.

And the same applies to scripts used by Rgui via "Open script". For R 
4.2.0 on recent Windows, the script file must be encoded as UTF-8. I've 
tested it on Bill's example expression and it works on my system.

However, please note that running a complete line of the script using 
Ctrl-R doesn't work due to a bug which has been already fixed, but the 
fix will appear in R 4.2.1. In R 4.2.0, you need to select the text 
before pressing Ctrl-R (or copy using Ctrl-C/Ctrl-V) if it contains 
non-ASCII characters.

Tomas

>
> -Bill
>
> On Mon, Apr 25, 2022 at 8:52 AM Bill Dunlap <williamwdunlap using gmail.com>
> wrote:
>
>> The answer depends on the encoding of the file containing the Chinese
>> characters and on the version of R (since you are using Windows).  I copied
>> your subject line into Wordpad and and added some syntax to make a valid R
>> expression
>>    s <- "永创 via R-help"
>> I then saved it with the type "Unicode Text Document".  In my version of
>> Wordpad this means UTF-16.  The bytes in the file are
>>    4.2.0> readBin("Chinese-utf-16.txt", what="raw",
>> n=file.size("Chinese-utf-16.txt"))
>>     [1] ff fe 73 00 20 00 3c 00 2d 00 20 00 22 00 38 6c 1b 52
>>    [19] 20 00 76 00 69 00 61 00 20 00 52 00 2d 00 68 00 65 00
>>    [37] 6c 00 70 00 22 00 0d 00 0a 00
>> All the nulls in the file are a hint that this is encoded using UTF-16,
>> not UTF-8.
>>
>> With R-4.2.0 (released a few days ago) I can source the file with
>>    4.2.0> source("Chinese-utf-16.txt", encoding="UTF-16")
>>    4.2.0> s
>>    [1] "永创 via R-help"
>>    4.2.0> Encoding(s)
>>    [1] "UTF-8"
>>
>> With R-4.1.2 I get
>>    > source("Chinese-utf-16.txt", encoding="UTF-16")
>>    Error in source("Chinese-utf-16.txt", encoding = "UTF-16") :
>>      Chinese-utf-16.txt:1:6: unexpected INCOMPLETE_STRING
>>    1: s <- "
>>             ^
>>    In addition: Warning message:
>>    In readLines(file, warn = FALSE) :
>>      invalid input found on input connection 'Chinese-utf-16.txt'
>>    > source(file("Chinese-utf-16.txt", encoding="UTF-16"))
>>    > s
>>    [1] "<U+6C38><U+521B> via R-help"
>>    > source(file("Chinese-utf-16.txt", encoding="UTF-16"), encoding="UTF-8")
>>    > s
>>    [1] "永创 via R-help"
>>    > Encoding(s)
>>    [1] "UTF-8"
>>    > charToRaw(s)
>>     [1] e6 b0 b8 e5 88 9b 20 76 69 61 20 52 2d 68 65 6c 70
>>
>> R-4.2.0 makes this much easier.
>>
>> -Bill
>>
>> On Mon, Apr 25, 2022 at 1:04 AM 永创 via R-help <r-help using r-project.org>
>> wrote:
>>
>>> Garbled characters appear in Chinese annotation when opening program
>>> script using RGui (see attached picture). I use a variety of methods have
>>> not been solved, I hope to help me solve this problem. Thank you.
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.