[R-pkg-devel] "found non-ASCII strings" with save(version = 2)

Vincent Arel-Bundock v|ncent@@re|-bundock @end|ng |rom umontre@|@c@
Wed Feb 5 14:32:04 CET 2020


Hi everyone,

My `countrycode` package ships with two data frames of characters in several languages: codelist and codelist_panel.

I converted all strings to UTF-8 using the `enc2utf8` function, but I also tried several other ways, with the stringi package, etc. As far as I can tell, the strings are all in UTF-8 format now:

url <- 'https://github.com/vincentarelbundock/countrycode/raw/master/data/codelist.rda'
temp <- tempfile()
download.file(url, temp)
load(temp)
tmp <- codelist[, sapply(codelist, is.character)]
library(stringi)
all(unlist(lapply(tmp, function(x) stri_enc_isutf8((na.omit(x))))))
[1] TRUE

After encoding, I saved the data frames with this command:

save(codelist, file = 'data/codelist.rda', compress = 'xz', version = 2)

Yet, when I run R CMD check, I get the following warning:

checking data for non-ASCII characters ... WARNING
    Warning: found non-ASCII strings
    'W<c3><bc>rtemberg' in object 'codelist'
    'S<c3><a3>o Tom<c3><a9> and Pr<c3><ad>ncipe' in object 'codelist'
    'W<c3><bc>rtemberg' in object 'codelist_panel'
    'S<c3><a3>o Tom<c3><a9> and Pr<c3><ad>ncipe' in object 'codelist_panel'

This warning disappears if I save the data frames using `save(version = 3)`. However, I would prefer to use version 2 to keep compatibility with older versions of R.

Does anyone have suggestions for how to handle this? What did I miss?

Thanks a lot for your time!

Vincent

--
Vincent Arel-Bundock

Professeur agrégé / Associate professor
http://arelbundock.com
Université de Montréal, Science politique
3150 rue Jean-Brillant, Pav. Lionel-Groulx, C-4020
Montréal, Québec, Canada, H3T 1N8



More information about the R-package-devel mailing list