[Rd] Experimental Rd parser in trunk.
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Nov 13 20:41:17 CET 2008
Just one additional comment in line below:
On 11/13/2008 1:44 PM, Duncan Murdoch wrote:
> On 11/13/2008 11:51 AM, Simon Urbanek wrote:
>> Duncan,
>>
>> I had a quick look at the parsers differences and I'm worried about
>> points 1. and 2. (on p.6) -- does that imply that \R{} is illegal and
>> so is any \foo{} for any macro \foo that doesn't take any arguments?
>> IMHO that would be fatal (if I understand it correctly), since that
>> construct is very often used (and I know of no alternatives) in cases
>> where you are referencing a macro that is followed by something that
>> is not a space. E.g.: 1\foo{}2 cannot be written as 1\foo2 as per 6.
>> so if \foo{} is disallowed there is no way to call \foo between 1 and
>> 2 when you don't want any spaces to be generated).
>> Maybe I'm just interpreting is incorrectly, so I just wanted to point
>> out that issue.
>
> Thanks for the comment. You are interpreting it correctly, and that is
> something that probably needs to change.
>
> The reasoning behind the current choice is that macros with optional
> arguments are ambiguous: for example, in R code, {} might be part of
> the code, not something for the Rd parser. We currently have \eqn and
> \deqn that have one or two args, but they're not going to occur in R
> code, so things currently work. (But if you want to see ugly Bison
> coding, look at how those VERBMACRO2 macros are handled. The Rd format
> is not easy to parse, being a mix of latex-like stuff, R code, and just
> about anything else in verbatim sections.)
>
> So I'd really strongly prefer to say that \foo *always* requires an arg,
> rather than let it be optional, if there are circumstances where it
> needs one.
>
> If we say that \foo never takes an arg, we'll need a way to distinguish
> between the following space being significant or not. One way is to
> allow {} or some other marker that signals a break without inserting
> anything, and is only interpreted in Latex-like mode. Another way (that
> I prefer) is described below.
I should say that allowing {} to immediately follow one of the 5 no-arg
macros, and having it gobbled up by the lexer, would be relatively easy
to implement. So then the two examples below could be coded as
"1\dots{}10" versus "1\dots 10", which is I think what you were asking
for. I have a mild preference for adding \sp (I don't like special
cases), but not a strong one.
Duncan Murdoch
>
> We could relax things a lot, and allow balanced braces as no-ops in
> Latex-like mode, but that will miss some typos. I fixed typos in 10
> files in r46908, and at least one of those was caught this way, in
> methods/man/Classes.Rd. It would also introduce an ambiguity, because
> \eqn and \deqn *are* going to occur in Latex-like mode. So
>
> \eqn{foo}{}bar
>
> could be either the two-arg version or the one-arg version followed by a
> no-op before the bar. (The default handling in Bison is that it would
> be the two-op version.) And I think it would be tricky to write the
> parser so that {} was handled differently in Latex-like mode from the
> way it's handled in the other modes. (The other modes count braces and
> echo them out.)
>
> There are currently only 5 macros which take no args: \cr, \dots,
> \ldots, \R, and \tab. I think the issue will only arise with \dots and
> \ldots. So my preferred decision would be to push this up a level:
> when the code is interpreted, \dots and \ldots are not followed by a
> space. To allow for a user who wants a space, we should introduce a 6th
> no-argument macro, \sp. Then "1\dots 10" will be rendered as "1...10"
> and "1\dots\sp 10" will be rendered as "1... 10".
>
> Duncan Murdoch
>
>>
>> Thanks,
>> Simon
>>
>>
>> On Nov 13, 2008, at 11:02 , Duncan Murdoch wrote:
>>
>>> I've just committed the parse_Rd() function to R-devel. This is a
>>> parser for Rd files, described in
>>>
>>> http://developer.r-project.org/parseRd.pdf
>>>
>>> It is not identical to the current parser, and about a dozen of the
>>> base man pages currently signal syntax errors. It also detected
>>> errors in 10 files that were errors according to both definitions,
>>> but were missed by the current system, and I've already fixed
>>> those. I plan to patch the rest so that they work in both systems
>>> soon. The differences between the two systems are described in the
>>> document above.
>>>
>>> I would like to hear comments about the changes -- some of them are
>>> still optional. I will be continuing to work on support functions
>>> for the parser, e.g. the print routine is currently quite primitive.
>>>
>>> I expect there may be incompatibilities with platforms on which I
>>> haven't tested. I developed the parser on Windows, and have tested
>>> it on a Linux system. There may be problems handling Rd files with
>>> unusual encodings (UTF-8 and Latin1 should be supported, but I don't
>>> know about others, and haven't even tested those yet).
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list