[Rd] Experimental Rd parser in trunk.

Duncan Murdoch murdoch at stats.uwo.ca
Thu Nov 13 19:44:50 CET 2008


On 11/13/2008 11:51 AM, Simon Urbanek wrote:
> Duncan,
> 
> I had a quick look at the parsers differences and I'm worried about  
> points 1. and 2. (on p.6) -- does that imply that \R{} is illegal and  
> so is any \foo{} for any macro \foo that doesn't take any arguments?  
> IMHO that would be fatal (if I understand it correctly), since that  
> construct is very often used (and I know of no alternatives) in cases  
> where you are referencing a macro that is followed by something that  
> is not a space. E.g.: 1\foo{}2 cannot be written as 1\foo2 as per 6.  
> so if \foo{} is disallowed there is no way to call \foo between 1 and  
> 2 when you don't want any spaces to be generated).
> Maybe I'm just interpreting is incorrectly, so I just wanted to point  
> out that issue.

Thanks for the comment.  You are interpreting it correctly, and that is 
something that probably needs to change.

The reasoning behind the current choice is that macros with optional 
arguments are ambiguous:  for example, in R code, {} might be part of 
the code, not something for the Rd parser.  We currently have \eqn and 
\deqn that have one or two args, but they're not going to occur in R 
code, so things currently work.  (But if you want to see ugly Bison 
coding, look at how those VERBMACRO2 macros are handled.  The Rd format 
is not easy to parse, being a mix of latex-like stuff, R code, and just 
about anything else in verbatim sections.)

So I'd really strongly prefer to say that \foo *always* requires an arg, 
rather than let it be optional, if there are circumstances where it 
needs one.

If we say that \foo never takes an arg, we'll need a way to distinguish 
between the following space being significant or not.  One way is to 
allow {} or some other marker that signals a break without inserting 
anything, and is only interpreted in Latex-like mode.  Another way (that 
I prefer) is described below.

We could relax things a lot, and allow balanced braces as no-ops in 
Latex-like mode, but that will miss some typos.  I fixed typos in 10 
files in r46908, and at least one of those was caught this way, in 
methods/man/Classes.Rd.  It would also introduce an ambiguity, because 
\eqn and \deqn *are* going to occur in Latex-like mode.  So

\eqn{foo}{}bar

could be either the two-arg version or the one-arg version followed by a 
no-op before the bar.  (The default handling in Bison is that it would 
be the two-op version.) And I think it would be tricky to write the 
parser so that {} was handled differently in Latex-like mode from the 
way it's handled in the other modes.  (The other modes count braces and 
echo them out.)

There are currently only 5 macros which take no args:  \cr, \dots, 
\ldots, \R, and \tab.  I think the issue will only arise with \dots and 
\ldots.  So my preferred decision would be to push this up a level: 
when the code is interpreted, \dots and \ldots are not followed by a 
space.  To allow for a user who wants a space, we should introduce a 6th 
no-argument macro, \sp.  Then "1\dots 10"  will be rendered as "1...10"
and "1\dots\sp 10" will be rendered as "1... 10".

Duncan Murdoch

> 
> Thanks,
> Simon
> 
> 
> On Nov 13, 2008, at 11:02 , Duncan Murdoch wrote:
> 
>> I've just committed the parse_Rd() function to R-devel.  This is a  
>> parser for Rd files, described in
>>
>> http://developer.r-project.org/parseRd.pdf
>>
>> It is not identical to the current parser, and about a dozen of the  
>> base man pages currently signal syntax errors.  It also detected  
>> errors in 10 files that were errors according to both definitions,  
>> but were missed by the current system, and I've already fixed  
>> those.  I plan to patch the rest so that they work in both systems  
>> soon.  The differences between the two systems are described in the  
>> document above.
>>
>> I would like to hear comments about the changes -- some of them are  
>> still optional.  I will be continuing to work on support functions  
>> for the parser, e.g. the print routine is currently quite primitive.
>>
>> I expect there may be incompatibilities with platforms on which I  
>> haven't tested.  I developed the parser on Windows, and have tested  
>> it on a Linux system.  There may be problems handling Rd files with  
>> unusual encodings (UTF-8 and Latin1 should be supported, but I don't  
>> know about others, and haven't even tested those yet).
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>



More information about the R-devel mailing list