[Rd] Why does the lexical analyzer drop comments ?

Fri Mar 20 20:18:24 CET 2009

On 3/20/2009 2:56 PM, romain.francois at dbmail.com wrote:
> It happens in the token function in gram.c: 
> 
> Â Â Â  c = SkipSpace();
> Â Â Â  if (c == '#') c = SkipComment();
> 
> and then SkipComment goes like that: 
> 
> static int SkipComment(void)
> {
> Â Â Â  int c;
> Â Â Â  while ((c = xxgetc()) != '\n' && c != R_EOF) ;
> Â Â Â  if (c == R_EOF) EndOfFile = 2;
> Â Â Â  return c;
> }
> 
> which effectively drops comments.
> 
> Would it be possible to keep the information somewhere ? 
> 
> The source code says this: 
> 
> Â *Â  The function yylex() scans the input, breaking it into
> Â *Â  tokens which are then passed to the parser.Â  The lexical
> Â *Â  analyser maintains a symbol table (in a very messy fashion).
> 
> so my question is could we use this symbol table to keep track of, say, COMMENT tokens. 
> 
> Why would I even care about that ? I'm writing a package that will
> perform syntax highlighting of R source code based on the output of the
> parser, and it seems a waste to drop the comments. 
> 
> An also, when you print a function to the R console, you don't get the comments, and some of them might be useful to the user.
> 
> Am I mad if I contemplate looking into this ? 

Comments are syntactically the same as whitespace.  You don't want them 
to affect the parsing.

If you're doing syntax highlighting, you can determine the whitespace by
looking at the srcref records, and then parse that to determine what 
isn't being counted as tokens.  (I think you'll find a few things there 
besides whitespace, but it is a fairly limited set, so shouldn't be too 
hard to recognize.)

The Rd parser is different, because in an Rd file, whitespace is 
significant, so it gets kept.

Duncan Murdoch