[R-pkg-devel] Check Error Due to Unicode in Documentation

Thu Jul 23 22:58:07 CEST 2020

On 23/07/2020 4:14 p.m., bill using denney.ws wrote:
> Hello,
> 
>   
> 
> I have a personal package that I�d eventually like to clean up and either
> find other packages to be homes for the functions or perhaps eventually
> release it on CRAN.  To that end, I try to keep package checks working.
> 
>   
> 
> One of the functions that I use is to try to simplify Unicode text to ASCII.
> With that, I tend to receive data that is scientifically-focused to the mu
> character should be converted to a �u� instead of the standard conversion to
> �m�.  On top of that, there are at least two Unicode characters that are
> visually the mu character, one is the micro character and the other is an
> actual lowercase mu.  This function converts both of those to �u� as
> desired.
> 
>   
> 
> I generate the documentation using roxygen2, but the text in the
> documentation aligns with the expected Unicode character, so I think the
> issue is not with roxygen.
> 
>   
> 
> The issue is that Codoc gives the following error:
> 
>   
> 
> * checking for code/documentation mismatches ... WARNING
> 
> Codoc mismatches from documentation object 'unicode_to_ascii':
> 
> unicode_to_ascii.character
> 
>    Code: function(x, verbose = FALSE, pattern = c("μ", "µ"), replacement
> 
>                   = c("u", "u"), general_
> 
>   
> 
> But, the code and documentation appear to be the same.  I think that the
> issue relates to something with Unicode support in Codoc, but I�m not sure
> how to test for that.  The code is here:
> 
>   
> 
> https://github.com/billdenney/bsd.report/blob/454caf217c5b333af1d65c7e63bbad
> 4194320e07/R/unicode_to_ascii.R#L28-L31
> 
>   
> 
> And the documentation is here:
> 
>   
> 
> https://github.com/billdenney/bsd.report/blob/454caf217c5b333af1d65c7e63bbad
> 4194320e07/man/unicode_to_ascii.Rd#L17-L24
> 
>   
> 
> Do you have any suggestions on how to make this code/documentation work with
> Codoc?

If you change the source to include the explicit characters (i.e. use 
pattern = c("μ", "µ") instead of pattern=c("\u03bc", "\u00b5")), does 
that help?

It may cause other issues:  WRE recommends against including UTF-8 chars 
in source code.

If that doesn't solve the problem, then it looks like an issue with 
Roxygen2.  I don't know if there's a way to tell it not to convert \u 
escapes into the corresponding character.  If there isn't, it seems like 
that's something they should add.  As a workaround, is there a way to 
say that this one particular .Rd file should be edited by hand, instead 
of auto-generated?

Duncan Murdoch