[Rd] encoding argument of source() in 3.5.0

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Mon Jun 4 10:44:11 CEST 2018


>>>>> peter dalgaard 
>>>>>     on Sun, 3 Jun 2018 23:51:24 +0200 writes:

    > Looks like this actually comes from readLines(), nothing
    > to do with source() as such: In current R-devel (still):

    >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >> readLines(f)
    > character(0)
    >> close(f)
    >> f <- file("http://home.versanet.de/~s-berman/source2.R")
    >> readLines(f)
    > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
    > [3] "}"                             

    > -pd

and that's not even readLines(), but rather how exactly the
connection is defined [even in your example above]

  > urlR <- "http://home.versanet.de/~s-berman/source2.R"
  > readLines(urlR, encoding="UTF-8")
  [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
  [3] "}"                             
  > f <- file(urlR, encoding = "UTF-8")
  > readLines(f)
  character(0)

and the same behavior with scan()  instead of readLines() :

> scan(urlR,"") # works
Read 7 items
[1] "source.test2"       "<-"                 "function()"         "{" 
[5] "print(\"Non-ascii:" "äöüß\")"            "}" 
> scan(f,"") # fails
Read 0 items
character(0)
> 

So it seems as if the bug is in the file() [or url()] C code ..
But then we also have to consider Windows .. where I think most changes have
happened during the  R-3.4.4 --> R-3.5.0  transition.


    >> On 2 Jun 2018, at 15:37 , Stephen Berman <stephen.berman using gmx.net> wrote:
    >> 
    >> In R 3.5.0 using the `encoding' argument of source() prevents loading
    >> files from the internet; without the `encoding' argument files can be
    >> loaded from the internet, but if they contain non-ascii characters,
    >> these are not correctly displayed under MS-Windows (but they are
    >> correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
    >> such problem: using `encoding' the files are loaded and non-ascii
    >> characters are correctly displayed under MS-Windows (but not without
    >> `encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
    >> URLs are real, in case anyone wants to try and reproduce the problem):
    >> 
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R")
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test1()
    >> [1] "This is a test."
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >> 
    >> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
    >> With 3.5.0 under MS-Windows, the transcript is the same except for the
    >> display of the last output, which is this:
    >> 
    >> [1] "Non-ascii: äöüß"
    >> 
    >> (Here there are eight non-ascii characters, which display the Unicode
    >> decompositions of the four non-ascii characters above.)
    >> 
    >> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
    >> the same except that the non-ascii characters are also correctly
    >> displayed even without the `encoding' argument):
    >> 
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R")
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test1()
    >> [1] "This is a test."
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >>> rm(source.test2)
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >> 
    >> I did a web search but didn't find any reports of this issue, nor did I
    >> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
    >> maybe I've overlooked something.  I'd be grateful for any enlightenment.
    >> 
    >> Steve Berman
    >> 
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > -- 
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list