[Rd] access to R parse tree for Lisp-style macros?

Mon Oct 3 15:42:50 CEST 2005

On 10/3/2005 3:25 AM, Andrew Piskorski wrote:
> R folks, I'm curious about possible support for Lisp-style macros in
> R.  I'm aware of the "defmacro" support for S-Plus and R discussed
> here:
> 
>   http://www.biostat.wustl.edu/archives/html/s-news/2002-10/msg00064.html 
> 
> but that's really just a syntactic short-cut to the run-time use of
> substitute() and eval(), which you could manually put into a function
> yourself if you cared too.  (AKA, not at all equivalent to Lisp
> macros.)  The mlocal() function in mvbutils also has seemingly similar
> macro-using-eval properties:
> 
>   http://cran.r-project.org/src/contrib/Descriptions/mvbutils.html 
>   http://www.maths.lth.se/help/R/.R/library/mvbutils/html/mlocal.html 
> 
> I could of course pre-process R source code, either using a custom
> script or something like M5:
> 
>   http://www.soe.ucsc.edu/~brucem/samples.html
>   http://groups.google.com/group/comp.compilers/browse_thread/thread/8ece2f34620f7957/000475ab31140327
> 
> But that's not what I'm asking about here.  As I understand it,
> Lisp-style macros manipulate the already-parsed syntax tree.  This
> seems very uncommon in non-Lisp languages and environments, but some -
> like Python - do have such support.  (I don't use Python, but I'm told
> that its standard parser APIs are as powerful as Lisp macros, although
> clunkier to use.)
> 
> Is implementing Lisp-style macros feasible in R?  Has anyone
> investigated this or tried to do it?
> 
> What internal representation does R use for its parse tree, and how
> could I go about manipulating it in some fashion, either at package
> build time or at run time, in order to support true Lisp-style macros?

It is like a list of lists, with modes attached that say how they are to 
be interpreted.  parse() gives a list of mode "expression", containing a 
list of function calls or atomic objects.  Function calls are stored as 
a list whose head is the function name with subsequent entries being the 
arguments.

The mode may be "expression", or "call", or others, depending on what 
you are actually dealing with.
> 
> Whenever I try something like this in R:
> 
>   > dput(parse(text="1+2"))
>   expression(1 + 2)
> 
> what I see looks exactly like R code - that '1 + 2' expression doesn't
> look very "parsed" to me.  Is that really it, or is there some sort of
> Scheme-like parse tree hiding underneath?  I see that the interactive
> Read-Eval-Print loop basically calls R_Parse1() in "src/main/gram.c",
> but from there I'm pretty much lost.

There's a parse tree underneath.  R is being helpful and deparsing it 
for you for display purposes.

To see it as a list, use "as.list" to strip off the mode, e.g.

 > as.list(parse(text="1+2"))
[[1]]
1 + 2

# A list containing one expression.  Expand it:

 > as.list(parse(text="1+2")[[1]])
[[1]]
`+`

[[2]]
[1] 1

[[3]]
[1] 2

# A function call to `+` with two arguments.  The arguments are atomic.

Use "mode" to work out how these are interpreted:

 > mode(parse(text="1+2"))
[1] "expression"
 > mode(parse(text="1+2")[[1]])
[1] "call"

> 
> Also, what happens at package build time?  I know that R CMD INSTALL
> generates binary *.rdb and *.rdx files for my package, but what do
> those do exactly, and how do they relate to the REPL and R_Parse1()?
> 
> Finally, are there any docs describing the design and implementation
> of the R internals?  Should I be looking anywhere other than the R
> developer page here?:

The source code is sometimes the best place for low level details like 
this.  The R Language manual sometimes gives low level details, but is 
is uneven in its coverage; I forget if it covers this.

Duncan Murdoch