[Rd] encoding argument of source() in 3.5.0
Martin Maechler
m@echler @ending from @t@t@m@th@ethz@ch
Mon Jun 4 10:44:11 CEST 2018
>>>>> peter dalgaard
>>>>> on Sun, 3 Jun 2018 23:51:24 +0200 writes:
> Looks like this actually comes from readLines(), nothing
> to do with source() as such: In current R-devel (still):
>> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>> readLines(f)
> character(0)
>> close(f)
>> f <- file("http://home.versanet.de/~s-berman/source2.R")
>> readLines(f)
> [1] "source.test2 <- function() {" " print(\"Non-ascii: äöüß\")"
> [3] "}"
> -pd
and that's not even readLines(), but rather how exactly the
connection is defined [even in your example above]
> urlR <- "http://home.versanet.de/~s-berman/source2.R"
> readLines(urlR, encoding="UTF-8")
[1] "source.test2 <- function() {" " print(\"Non-ascii: äöüß\")"
[3] "}"
> f <- file(urlR, encoding = "UTF-8")
> readLines(f)
character(0)
and the same behavior with scan() instead of readLines() :
> scan(urlR,"") # works
Read 7 items
[1] "source.test2" "<-" "function()" "{"
[5] "print(\"Non-ascii:" "äöüß\")" "}"
> scan(f,"") # fails
Read 0 items
character(0)
>
So it seems as if the bug is in the file() [or url()] C code ..
But then we also have to consider Windows .. where I think most changes have
happened during the R-3.4.4 --> R-3.5.0 transition.
>> On 2 Jun 2018, at 15:37 , Stephen Berman <stephen.berman using gmx.net> wrote:
>>
>> In R 3.5.0 using the `encoding' argument of source() prevents loading
>> files from the internet; without the `encoding' argument files can be
>> loaded from the internet, but if they contain non-ascii characters,
>> these are not correctly displayed under MS-Windows (but they are
>> correctly displayed under GNU/Linux). With R 3.4.{2,3,4} there is no
>> such problem: using `encoding' the files are loaded and non-ascii
>> characters are correctly displayed under MS-Windows (but not without
>> `encoding'). Here is a transcript from R 3.5.0 under GNU/Linux (the
>> URLs are real, in case anyone wants to try and reproduce the problem):
>>
>>> ls()
>> character(0)
>>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
>>> ls()
>> character(0)
>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>> ls()
>> character(0)
>>> source("http://home.versanet.de/~s-berman/source1.R")
>>> ls()
>> [1] "source.test1"
>>> source("http://home.versanet.de/~s-berman/source2.R")
>>> ls()
>> [1] "source.test1" "source.test2"
>>> source.test1()
>> [1] "This is a test."
>>> source.test2()
>> [1] "Non-ascii: äöüß"
>>
>> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
>> With 3.5.0 under MS-Windows, the transcript is the same except for the
>> display of the last output, which is this:
>>
>> [1] "Non-ascii: äöüß"
>>
>> (Here there are eight non-ascii characters, which display the Unicode
>> decompositions of the four non-ascii characters above.)
>>
>> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
>> the same except that the non-ascii characters are also correctly
>> displayed even without the `encoding' argument):
>>
>>> ls()
>> character(0)
>>> source("http://home.versanet.de/~s-berman/source1.R")
>>> ls()
>> [1] "source.test1"
>>> source("http://home.versanet.de/~s-berman/source2.R")
>>> ls()
>> [1] "source.test1" "source.test2"
>>> source.test1()
>> [1] "This is a test."
>>> source.test2()
>> [1] "Non-ascii: äöüß"
>>> rm(source.test2)
>>> ls()
>> [1] "source.test1"
>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>> ls()
>> [1] "source.test1" "source.test2"
>>> source.test2()
>> [1] "Non-ascii: äöüß"
>>
>> I did a web search but didn't find any reports of this issue, nor did I
>> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
>> maybe I've overlooked something. I'd be grateful for any enlightenment.
>>
>> Steve Berman
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list