[R] How to comment in R

Wed Feb 11 22:37:39 CET 2009

Apparently I was not clear in my intention in my original reply.  I have no problem with you or anyone else implementing this in whatever way.  Someone could even use Friedl's regex to write a preprocessor in R, he did the hard work already if you follow the rules that he based his regex on.  I think the hard/non-trivial part is agreeing on the rules, and then dealing with all the future posters who expected a different set of rules.

Additional comments below

> -----Original Message-----
> From: Wacek Kusnierczyk [mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no]
> Sent: Wednesday, February 11, 2009 12:45 PM
> To: Greg Snow
> Cc: R help
> Subject: Re: [R] How to comment in R

[snip]

> 
> hey, be fair.  i was talking about the perl-style pod block comments,
> where it *is* very easy to implement. 

Your phrase that I responded to was:
"an extension to the parser that would accept multiline start-end comment
tags, be it c-style /* */, perl-style =pod =cut, whatever, should be
fairly  trivial to implement."

I interpreted the part after the last , to refer to the whole list, you apparently meant it to refer to only the last element of the list (English list, not R list).  I could show this to my friend that is a high school English teacher and see what he thinks, but I would prefer to just blame the English language (or the American corruption thereof) rather than each other.

 just any* line that starts with
> =[^\s] (including =cut; the pattern is perhaps just slightly more
> complex, but it doesn't really matter here) in a non-block-comment
> context starts a block comment, and only a line with =cut\s in a
> block-comment context ends a block comment.  as clear as that.  #=head
> does not start a block comment, and #foo =cut does not stop a block
> comment.  is this really difficult to conceive,  agree on, and
> implement?

OK, implement your idea of #start and #end at the beginning of lines, probably easy to implement.

Here is my prediction of what will happen when/if that is added.

You will use it as intended only putting free text between the #start and #end tags so that it is clear that it is documentation/comments.

But someone else upon learning of the existence of #start and #end will start using it to skip over sections of code that doesn't work, or that was redone a different way, etc.  Then they will eventually try to comment out a block with the comment already in it (nesting) or misread something because the line containing #start or #end will have scrolled off the screen, or will want to start/end the comment part way through a line.  When they discover the problems due to them wanting the #start #end construct to do more than you intended then they will post asking why they don't behave like /* */ in SAS/C/etc. and this whole discussion will start again.

Is that enough reason to not implement it. Probably not, but don't claim that you were not warned.

To get something added to the main R you need to convince someone that the benefits outweigh the costs.  My attempted point is that the costs include more than just the implementation, there is the documentation and dealing with all the people who don't read the documentation and expect it to behave differently.

I believe that there is a small benefit to what you propose, in my mind it does not outweigh the potential cost (but my opinion on this does not really matter, as I am not part of R-core and will have little or no impact on what is or is not added).  If others whose opinions on this do matter are not jumping on the bandwagon to implement this, then it is probably for the same reason, they are not convinced that the benefits outweigh the costs.  You can:

1: live with it (or without it)
2: convince them that the benefits are greater
3: convince them that the costs are less

> * not inside string delimiters, for example.  already solved in r for
> single line comments-like multiline strings.
> 
> 
> > While the parser can process the comments without using regular
> expressions, some of the issues that Friedl brings up still need to be
> considered in deciding the rules.  Implementing this in the parser may
> well be trivial once the rules are decided on (but way beyond me), but
> I still think that deciding on the rules and documenting them is far
> from trivial.
> >
> 
> might be helpful to see some concrete counterexamples.

I don't understand what you want as a counterexample.

I think that the fact that nobody has committed either direction on Duncan's example supports my point.

> > I remember having some C code that compiled fine and did as intended
> with one compiler, then when I tried compiling with a different
> compiler it threw an error based on the commenting (probably the
> difference was in the preprocessors, not the compilers), so the 2
> different versions of the C compiler/preprocessor did not even agree on
> the rules (I don't remember emacs complaining either way).
> >
> 
> i remember a c++ compiler that would ignore for loops (i had to replace
> all of them with while...).
> 
> r is already complex enough, both syntactically and semantically, for
> multiline comments to be an outstanding unsurmountable complication.
> 
> 
> > Others have mentioned using sed as another way to add/strip comment
> markers to regions of code.  Along these lines someone could always use
> C-style (or PL/I style to be more correct but less common in my
> experience) comments, then run the code through a C preprocessor before
> submitting to R (then you just have to live with the rules of the
> preprocessor).
> >
> 
> well, cpp is a bit too focused on c.  from man cpp:
> 
> " The C preprocessor is intended to be used only with C, C++, and
> Objec‐
>        tive-C source code.  In the past, it has been abused as a
> general
> text
>        processor.  It will choke on input which does not obey C’s
> lexical
>        rules."
> 
> which does not mean you couldn't be able to tweak it to block-
> commenting
> r code.
> 
> btw. the sed example was trivialized, it would not treat comment-like
> parts of character strings properly -- but sed is not intended as a
> generic parser, it's still a line-by-line text processor, with some
> features allowing it to consider multiline context, and it's presumably
> best to have the r parser do the job.
> 
> > I have no problem with someone adding this capability to R, I just
> prefer that R-Core spends their time on higher priority items.  If
> someone else wants to contribute the change, I won't complain (unless
> it breaks my existing code, which is unlikely if done properly), I just
> wanted anyone who was thinking of doing this to think about some of the
> potential pitfalls so that if they implement it, they implement it
> well.
> >
> >
> 
> it's *design* that has to be done properly in the first place.

That is what I was trying to say (along with the fact that the design may not be trivial).

[snip]
-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111