[Rd] The regular expressions in compareVersion()

peter dalgaard pdalgd at gmail.com
Fri Apr 25 14:50:53 CEST 2014


On 25 Apr 2014, at 14:04 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

> On 24/04/2014, 10:27 PM, Simon Urbanek wrote:
>> FWIW the link has a long thread that is 90% irrelevant - AFAICS the relevant part is
>> 
>> From: Yihui Xie-2
>> Sep 02, 2013; 4:11pm
>> Re: Sweave: printing an underscore in the output from an R command
>> [...]
>> Now you are good at the regular expression level, but Sweave comes and
>> bites you, and that is due to this bug in the regular expression in
>> Sweave Noweb syntax:
>> 
>>> SweaveSyntaxNoweb$docexpr
>> [1] "\\\\Sexpr\\{([^\\}]*)\\}"
>> 
>> It should have been "\\\\Sexpr\\{([^}]*)\\}", i.e. } does not need to
>> be escaped inside [], and \\ will be interpreted literally inside [].
>> In your case, Sweave sees \ in \Sexpr{}, and the regular expression
>> stops matching there, and is unable to see } after \, so it believes
>> there is no inline R expressions in your document.
>> 
> 
> Thanks.  I've put in a bug report on this one now, so it shouldn't get missed again.  If nobody else gets to it first I'll deal with it.
> 
> I don't see any value in fixing the compareVersion example, but if someone submits a bug report about it, someone else might fix it.

No point in clinging to obviously incorrect code either. Fixed in R-devel.

Peter

> 
> Duncan Murdoch
> 
>> 
>> On Apr 24, 2014, at 10:15 PM, Yihui Xie <xie at yihui.name> wrote:
>> 
>>> You are right that this is unlikely to cause problems, because users
>>> are unlikely to put backslashes in version numbers. Henrik has pointed
>>> out the problem. It is not about "making the source code a little
>>> cleaner", but "making it correct". Either someone in R core corrects
>>> the wrong regular expressions in a few seconds (unless you think \ can
>>> be a legal character in a version number), or I just give up the
>>> report. It seems the latter is easier. It is not worth additional
>>> Q&A's back and forth.
>>> 
>>> Regarding the regular expression problem for \Sexpr{} in Sweave,
>>> please see here for a record:
>>> http://r.789695.n4.nabble.com/Sweave-printing-an-underscore-in-the-output-from-an-R-command-td4675177.html
>>> As I said, it is a similar problem: someone tried to escape a
>>> character that did not need to be escaped in [].
>>> 
>>> Regards,
>>> Yihui
>>> --
>>> Yihui Xie <xieyihui at gmail.com>
>>> Web: http://yihui.name
>>> 
>>> 
>>> On Thu, Apr 24, 2014 at 6:20 PM, Duncan Murdoch
>>> <murdoch.duncan at gmail.com> wrote:
>>>> On 24/04/2014, 5:26 PM, Henrik Bengtsson wrote:
>>>>> 
>>>>> On Thu, Apr 24, 2014 at 1:42 PM, Duncan Murdoch
>>>>> <murdoch.duncan at gmail.com> wrote:
>>>>>> 
>>>>>> On 24/04/2014, 1:11 PM, Yihui Xie wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I guess the backslash should not be used as the separator for
>>>>>>> strsplit() in compareVersion(), because the period in [.] is no longer
>>>>>>> a metacharacter (no need to "escape" it using a backslash):
>>>>>>> 
>>>>>>> 
>>>>>>> https://github.com/wch/r-source/blob/trunk/src/library/utils/R/packages.R#L866-L867
>>>>>>> 
>>>>>>>> compareVersion
>>>>>>> 
>>>>>>> 
>>>>>>> function (a, b)
>>>>>>> {
>>>>>>> ....
>>>>>>>      a <- as.integer(strsplit(a, "[\\.-]")[[1L]])
>>>>>>>      b <- as.integer(strsplit(b, "[\\.-]")[[1L]])
>>>>>>> ....
>>>>>>> <environment: namespace:utils>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Could you post an example where this causes trouble, or are you just
>>>>>> suggesting this as a way to make the source a little cleaner?
>>>>> 
>>>>> 
>>>>> Maybe it's already clear, but [\\.] is the set for the two symbols '\'
>>>>> and '.', not '.' alone.  For example, I would expect an error below:
>>>>> 
>>>>>> compareVersion("3.14-59.26", "3.14-59\\26")
>>>>> 
>>>>> [1] 0
>>>>> 
>>>> 
>>>> How does that cause problems?
>>>> 
>>>> Duncan Murdoch
>>>> 
>>>> 
>>>>> /Henrik
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> A similar regular expression problem also exists in the Sweave syntax
>>>>>>> (for \Sexpr{}), and I have reported it once. It was fixed but the fix
>>>>>>> was immediately reverted for some reason:
>>>>>>> 
>>>>>>> 
>>>>>>> https://github.com/wch/r-source/commit/52b0a46e15136a7f9e4777e9960fdda6d84880c0
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> A link to your report would be more useful, if it included an example
>>>>>> where
>>>>>> the bad regexp causes trouble.
>>>>>> 
>>>>>> Duncan Murdoch
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list