[Rd] readLines interaction with gsub different in R-dev

Dirk Eddelbuettel edd at debian.org
Sat Feb 17 16:15:19 CET 2018


On 17 February 2018 at 21:10, Hugh Parsonage wrote:
| I was told to re-raise this issue with R-dev:
| 
| In the documentation of R-dev and R-3.4.3, under ?gsub
| 
| > replacement
| >    ... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion.
| 
| However, the following code runs differently:
| 
| tempf <- tempfile()
| writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
| entry <- readLines(tempf, encoding = "UTF-8")
| gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
| 
| 
| "AUTHOR: AMÉLIE"  # R-3.4.3
| 
| "A"                              # R-dev

Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the regexp
you use wrong, ie isn't R-devel giving the correct answer?

R> tempf <- tempfile()
R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
R> entry <- readLines(tempf, encoding = "UTF-8")
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
[1] "A"
R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR"
R> gsub("(.*)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR: AMÉLIE"
R> 

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org



More information about the R-devel mailing list