[Rd] Experimental Rd parser in trunk.

Thu Nov 13 20:41:17 CET 2008

Just one additional comment in line below:

On 11/13/2008 1:44 PM, Duncan Murdoch wrote:
> On 11/13/2008 11:51 AM, Simon Urbanek wrote:
>> Duncan,
>> 
>> I had a quick look at the parsers differences and I'm worried about  
>> points 1. and 2. (on p.6) -- does that imply that \R{} is illegal and  
>> so is any \foo{} for any macro \foo that doesn't take any arguments?  
>> IMHO that would be fatal (if I understand it correctly), since that  
>> construct is very often used (and I know of no alternatives) in cases  
>> where you are referencing a macro that is followed by something that  
>> is not a space. E.g.: 1\foo{}2 cannot be written as 1\foo2 as per 6.  
>> so if \foo{} is disallowed there is no way to call \foo between 1 and  
>> 2 when you don't want any spaces to be generated).
>> Maybe I'm just interpreting is incorrectly, so I just wanted to point  
>> out that issue.
> 
> Thanks for the comment.  You are interpreting it correctly, and that is 
> something that probably needs to change.
> 
> The reasoning behind the current choice is that macros with optional 
> arguments are ambiguous:  for example, in R code, {} might be part of 
> the code, not something for the Rd parser.  We currently have \eqn and 
> \deqn that have one or two args, but they're not going to occur in R 
> code, so things currently work.  (But if you want to see ugly Bison 
> coding, look at how those VERBMACRO2 macros are handled.  The Rd format 
> is not easy to parse, being a mix of latex-like stuff, R code, and just 
> about anything else in verbatim sections.)
> 
> So I'd really strongly prefer to say that \foo *always* requires an arg, 
> rather than let it be optional, if there are circumstances where it 
> needs one.
> 
> If we say that \foo never takes an arg, we'll need a way to distinguish 
> between the following space being significant or not.  One way is to 
> allow {} or some other marker that signals a break without inserting 
> anything, and is only interpreted in Latex-like mode.  Another way (that 
> I prefer) is described below.

I should say that allowing {} to immediately follow one of the 5 no-arg 
macros, and having it gobbled up by the lexer, would be relatively easy 
to implement.  So then the two examples below could be coded as 
"1\dots{}10" versus "1\dots 10", which is I think what you were asking 
for.  I have a mild preference for adding \sp (I don't like special 
cases), but not a strong one.

Duncan Murdoch

> 
> We could relax things a lot, and allow balanced braces as no-ops in 
> Latex-like mode, but that will miss some typos.  I fixed typos in 10 
> files in r46908, and at least one of those was caught this way, in 
> methods/man/Classes.Rd.  It would also introduce an ambiguity, because 
> \eqn and \deqn *are* going to occur in Latex-like mode.  So
> 
> \eqn{foo}{}bar
> 
> could be either the two-arg version or the one-arg version followed by a 
> no-op before the bar.  (The default handling in Bison is that it would 
> be the two-op version.) And I think it would be tricky to write the 
> parser so that {} was handled differently in Latex-like mode from the 
> way it's handled in the other modes.  (The other modes count braces and 
> echo them out.)
> 
> There are currently only 5 macros which take no args:  \cr, \dots, 
> \ldots, \R, and \tab.  I think the issue will only arise with \dots and 
> \ldots.  So my preferred decision would be to push this up a level: 
> when the code is interpreted, \dots and \ldots are not followed by a 
> space.  To allow for a user who wants a space, we should introduce a 6th 
> no-argument macro, \sp.  Then "1\dots 10"  will be rendered as "1...10"
> and "1\dots\sp 10" will be rendered as "1... 10".
> 
> Duncan Murdoch
> 
>> 
>> Thanks,
>> Simon
>> 
>> 
>> On Nov 13, 2008, at 11:02 , Duncan Murdoch wrote:
>> 
>>> I've just committed the parse_Rd() function to R-devel.  This is a  
>>> parser for Rd files, described in
>>>
>>> http://developer.r-project.org/parseRd.pdf
>>>
>>> It is not identical to the current parser, and about a dozen of the  
>>> base man pages currently signal syntax errors.  It also detected  
>>> errors in 10 files that were errors according to both definitions,  
>>> but were missed by the current system, and I've already fixed  
>>> those.  I plan to patch the rest so that they work in both systems  
>>> soon.  The differences between the two systems are described in the  
>>> document above.
>>>
>>> I would like to hear comments about the changes -- some of them are  
>>> still optional.  I will be continuing to work on support functions  
>>> for the parser, e.g. the print routine is currently quite primitive.
>>>
>>> I expect there may be incompatibilities with platforms on which I  
>>> haven't tested.  I developed the parser on Windows, and have tested  
>>> it on a Linux system.  There may be problems handling Rd files with  
>>> unusual encodings (UTF-8 and Latin1 should be supported, but I don't  
>>> know about others, and haven't even tested those yet).
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel