[Rd] encoding argument of source() in 3.5.0

peter dalgaard pd@lgd @ending from gm@il@com
Mon Jun 4 11:31:30 CEST 2018


It's not Windows-specific, though. My example was on a Mac...

I hope we can sort this out before 3.5.1.

-pd

> On 4 Jun 2018, at 10:44 , Martin Maechler <maechler using stat.math.ethz.ch> wrote:
> 
> So it seems as if the bug is in the file() [or url()] C code ..
> But then we also have to consider Windows .. where I think most changes have
> happened during the  R-3.4.4 --> R-3.5.0  transition.
> 
> 
>>> On 2 Jun 2018, at 15:37 , Stephen Berman <stephen.berman using gmx.net> wrote:
>>> 
>>> In R 3.5.0 using the `encoding' argument of source() prevents loading
>>> files from the internet; without the `encoding' argument files can be
>>> loaded from the internet, but if they contain non-ascii characters,
>>> these are not correctly displayed under MS-Windows (but they are
>>> correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
>>> such problem: using `encoding' the files are loaded and non-ascii
>>> characters are correctly displayed under MS-Windows (but not without
>>> `encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
>>> URLs are real, in case anyone wants to try and reproduce the problem):
>>> 
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R")
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test1()
>>> [1] "This is a test."
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>> 
>>> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
>>> With 3.5.0 under MS-Windows, the transcript is the same except for the
>>> display of the last output, which is this:
>>> 
>>> [1] "Non-ascii: äöüß"
>>> 
>>> (Here there are eight non-ascii characters, which display the Unicode
>>> decompositions of the four non-ascii characters above.)
>>> 
>>> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
>>> the same except that the non-ascii characters are also correctly
>>> displayed even without the `encoding' argument):
>>> 
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R")
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test1()
>>> [1] "This is a test."
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>>> rm(source.test2)
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>> 
>>> I did a web search but didn't find any reports of this issue, nor did I
>>> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
>>> maybe I've overlooked something.  I'd be grateful for any enlightenment.
>>> 
>>> Steve Berman
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com
> 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list