[R-pkg-devel] R CMD checks URLs formatted for LaTeX instead of using the non-LaTeX URLs, and fails

Sebastian Meyer @eb@meyer @end|ng |rom |@u@de
Thu Jul 7 22:41:23 CEST 2022


Am 05.07.22 um 19:56 schrieb Ralf Herold:
> Thanks and I would like to define the follow-up actions:
> 
> 1) change function "writeURL" on line 207 to read, for example, url <- 
> fsub("([%#&])", "\\\\1",  url) in 
> https://svn.r-project.org/R/trunk/src/library/tools/R/Rd2latex.R 
> <https://svn.r-project.org/R/trunk/src/library/tools/R/Rd2latex.R> to 
> always escape URLs as you mention.
> 
> How can this be moved forward? This seems R core code, thus needs to be 
> reported in R Bugzilla by one of its members (I am not one): 
> https://www.r-project.org/bugs.html#where-to-submit-bug-reports-and-patches 
> <https://www.r-project.org/bugs.html#where-to-submit-bug-reports-and-patches>. 
> 

A Bugzilla entry would have been nice for future reference but is no 
longer necessary. The Rd2latex() bug is now fixed in the development 
version of R (>= r82557) such that URLs with & or # characters can then 
also be used inside \tabular and give the same link as in the HTML 
version: \tabular{l}{\url{https://example.org/a&b#c}} should just work.
In other words, Rd2latex() now correctly handles the input URL as 
'verbatim' text (as specified in WRE Section 2.3), which also means that 
backslashes in the input that do not escape percent or braces (Rd 
specials) are preserved in the output (as was already the case for HTML).

It is planned to port the fix to R-patched (future R 4.2.2).

In package development I'd probably avoid such URLs inside \tabular 
until after that release. Otherwise, if you want to support building the 
PDF manual in current and future R, you'd need to use \out{} and do all 
the escaping there yourself, for example:

\name{test}
\title{test}
\description{
\tabular{l}{
    \ifelse{latex}{
      \out{\href{https://example.org/a\&b\#c}{link}}
    }{
      \href{https://example.org/a&b#c}{link}
    }
}
}

AFAICS, the below points are obsolete.

Best regards,

	Sebastian Meyer

> 
> Could someone from this list do this? Many thanks
> 
> 2) change function "url_db_from_package_Rd_db" to call 
>   ".get_urls_from_Rd" with parameter "ifdef = TRUE" on line 178 in 
> https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R>. This 
> will activate the existing code that is intended to handle ifdef{}{}{}. 
> This seems important for the issue I have reported and beyond. Same 
> procedure as above?
> 
> 3) change Rd.sty, not sure, 1) seems more relevant.
> 
> Please advise, thanks
> 
> Ralf
> 
> 
> 
>> Am 04.07.2022 um 00:08 schrieb Sebastian Meyer <seb.meyer using fau.de 
>> <mailto:seb.meyer using fau.de>>:
>>
>> Am 03.07.22 um 08:27 schrieb Ralf Herold:
>>> Thanks Sebastian,
>>> but not only hash, also ampersand in \href in a tabular environment 
>>> does need to be escaped, otherwise it does not latex (example below). 
>>> I was not aware it is a known limitation for .Rd files despite 
>>> searching for it.
>>
>> I stumbled over that problem a while ago and found that the escaping 
>> issue for the hash symbol is documented in the hyperref manual (but 
>> currently not accounted for by Rd2latex):
>>
>>> The special characters # and ~ do *not* need to be escaped in any way 
>>> (unless the command is used in the argument of another command).
>>
>> For example, this LaTeX code fails to compile:
>> \emph{\href{https://example.org/# <https://example.org/#>}{hash}}
>> In contrast, an ampersand would not need to be escaped in that LaTeX 
>> example.
>>
>> However, I can confirm that a LaTeX error results if an ampersand is 
>> used in a \href URL (but not in \url} that is passed to the special 
>> \Tabular LaTeX command from Rd.sty that is used by Rd2latex() for 
>> \tabular Rd input. Thank you for the heads-up.
>>
>> I think it would be good to improve Rd2latex() / Rd.sty for URLs in 
>> \tabular that contain & or # rather than require special LaTeX 
>> treatment in the Rd source. My preliminary testing shows that hyperref 
>> is happy if [&%#] are always escaped in URLs (sometimes it is not 
>> necessary but it also does not seem to hurt).
>>
>> Best regards,
>>
>> Sebastian Meyer
>>
>>> My use case (with eventually more meaningful query parameters and 
>>> possibly anchors) would work if the existing R code block for 
>>> handling \ifelse in urltools.R was activated as shown below, and this 
>>> is my suggestion. How could I propose this?
>>> Kind regards,
>>> Ralf
>>> \name{mre}
>>> \title{mre}
>>> \description{mre}
>>> \details{
>>> \tabular{l}{
>>>   \href{https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With> 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With>>}{link}
>>> }}
>>> LaTeX errors:
>>> ! Argument of \href using split has an extra }.
>>> <inserted text>
>>>                 \par
>>> l.24 }
>>> Runaway argument?
>>> https://clinicaltrials.gov/ct2/results?cond=Infections\unskip 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\unskip> 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\unskip 
>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\unskip>> \hfil
>>> ! Paragraph ended before \href using split was complete.
>>> <to be read again>
>>>                    \par
>>> l.24 }
>>> ! Extra }, or forgotten \endgroup.
>>> <recently read> }
>>>> Am 03.07.2022 um 01:51 schrieb Sebastian Meyer <seb.meyer using fau.de 
>>>> <mailto:seb.meyer using fau.de> <mailto:seb.meyer using fau.de 
>>>> <mailto:seb.meyer using fau.de>>>:
>>>>
>>>> Am 02.07.22 um 12:01 schrieb Ralf Herold:
>>>>> Hello, in my package documentation I want to include URLs with 
>>>>> query string parameters and anchors, within a table. A minimally 
>>>>> reproducible example is this content in file "man/mre.Rd":
>>>>> \name{mre}
>>>>> \title{mre}
>>>>> \description{mre}
>>>>> \details{
>>>>> \tabular{l}{
>>>>>   \ifelse{latex}{\href{https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop>>}{latex 
>>>>> link}}{\href{https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop>>}{non-latex 
>>>>> link}}
>>>>> }}
>>>>> The ifelse{}{}{} construct is necessary since ampersands in a table 
>>>>> need to be escaped for LaTeX rendering.
>>>>
>>>> This is a red herring. Ampersands do *not* need to be escaped in 
>>>> \href URLs. The problem is the hash symbol, which needs to be 
>>>> escaped if \href is nested within another markup macro, here 
>>>> \Tabular (from Rd.sty). This is a known limitation; Rd2latex will 
>>>> probably do the escaping in the future. It's good to see a use case.
>>>>
>>>> I think currently the best solutions for you are to simply omit the 
>>>> #tableTop part in the LaTeX version or to not use such URLs inside a 
>>>> \tabular.
>>>>
>>>> Hope this helps.
>>>> Best regards,
>>>>
>>>> Sebastian Meyer
>>>>
>>>>> Each of the following commands checks and renders the respective 
>>>>> output correctly:
>>>>> tools::checkRd("man/mre.Rd")
>>>>> tools::Rd2txt("man/mre.Rd")
>>>>> tools::Rd2latex("man/mre.Rd")
>>>>> tools::Rd2HTML("man/mre.Rd")
>>>>> system2("R", c("CMD", "Rd2pdf", "man/mre.Rd"))
>>>>> However, rhub::check_for_cran() results in NOTES:
>>>>> Found the following (possibly) invalid URLs:
>>>>>   URL: 
>>>>> https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\&rslt=With\#tableTop>>
>>>>>     From: man/mre.Rd
>>>>>     Status: 400
>>>>>     Message: Bad Request
>>>>> Subsequently, CRAN maintainers refused accepting the package.
>>>>> However, the underlying cause is that, during such checks, all 
>>>>> apparent URLs are extracted from .Rd files, irrespective of any 
>>>>> \ifelse{}{}{} constructs. This in turn is due to such checks 
>>>>> involving calls to function ".get_urls_from_Rd" without setting its 
>>>>> argument "ifdef" to TRUE.
>>>>> Here is how to see this behaviour:
>>>>> db <- tools::Rd_db(dir = ".")
>>>>> # get functions
>>>>> source("https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R> 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R>>")
>>>>> source("https://svn.r-project.org/R/trunk/src/library/tools/R/utils.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/utils.R> 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/utils.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/utils.R>>")
>>>>> .Rd_deparse <- tools:::.Rd_deparse
>>>>> RdTags <- tools:::RdTags
>>>>> # default, leading to invalid url in [1]
>>>>> # > .get_urls_from_Rd(db)
>>>>> # [1] 
>>>>> "https://clinicaltrials.gov/ct2/results?cond=Infections\\&rslt=With\\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\\&rslt=With\\#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\\&rslt=With\\#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections\\&rslt=With\\#tableTop>>"
>>>>> # [2] 
>>>>> "https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop>>"
>>>>> # returning relevant valid url
>>>>> #> .get_urls_from_Rd(db, ifdef = TRUE)
>>>>> # [1] 
>>>>> "https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop> 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop 
>>>>> <https://clinicaltrials.gov/ct2/results?cond=Infections&rslt=With#tableTop>>"
>>>>> This can be addressed by either:
>>>>> -- changing the signature of ".get_urls_from_Rd" in line 50 in 
>>>>> https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R> 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R>> 
>>>>> to read "ifdef = TRUE". Of note, this function has a code block to 
>>>>> handle such ifdef constructs which indicates it should be possible 
>>>>> to use them in Rd files.
>>>>> -- changing the calling function "url_db_from_package_Rd_db" to 
>>>>> include "ifdef = TRUE" on line 178 in 
>>>>> https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R> 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R 
>>>>> <https://svn.r-project.org/R/trunk/src/library/tools/R/urltools.R>>
>>>>> Please advise how to advance on this issue, thank you very much.
>>>>> Greetings
>>>>> Ralf
>>>>> ______________________________________________
>>>>> R-package-devel using r-project.org 
>>>>> <mailto:R-package-devel using r-project.org> 
>>>>> <mailto:R-package-devel using r-project.org 
>>>>> <mailto:R-package-devel using r-project.org>> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel 
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-package-devel> 
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-package-devel 
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>>
>



More information about the R-package-devel mailing list