[Rd] Mapping parse tree elements to tokens
murdoch.duncan at gmail.com
Wed Jul 29 20:47:40 CEST 2015
On 29/07/2015 2:30 PM, Michael Lawrence wrote:
> Probably need a generic tree based on "ParseNode" objects that
> associate the line information with the symbol (for leaf nodes). As
> Duncan notes, it should be possible to gather that from the table.
> But it would be nice if there was an "expr" column in the parse data
> column in addition to "text". It would contain the parsed object.
> Otherwise, to use the table, one is often reparsing the text, which
> just seems redundant and inconvenient.
Can you (both Jim and Michael) describe the uses you might have for
this? There are lots of possible changes that could make this
- attach to each item in the parse tree, as the parser package did.
(Bad idea for general use which is why I dropped it, but
it could be done as a special option to parse, if you aren't planning to
evaluate the expression.)
- give the index into the parse tree of each item, i.e. c(1,1),
c(1,2), c(1,3) in the example below, or just 1,2,3 along with a function
to reconstruct the full path.
- give a copy of the branch of the parse tree, as Michael suggests.
etc. Which is best for your purposes?
> On Wed, Jul 29, 2015 at 9:43 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> > On 29/07/2015 12:13 PM, Jim Hester wrote:
> >> I would like to map the parsed tokens obtained from utils::getParseData()
> >> to the parse tree and elements obtained by base::parse().
> >> It looks like back when this code was in the parser package the parse()
> >> function annotated the elements in the tree with their id, which would
> >> allow you to perform this mapping. However when the code was included in
> >> R
> >> this functionality was removed.
> > Yes, not all elements of the parse tree can legally have attributes
> > attached.
> >> ?getParseData states
> >> The ‘id’ values are not attached to the elements of the parse
> >> tree, they are only retained in the table returned by
> >> ‘getParseData’.
> >> Is there another way you can map between the getParseData() tokens and
> >> elements of the parse tree that makes this additional annotation
> >> unnecessary? Or is this simply not possible?
> > I think you can't get to it, though you can get close by looking at the id &
> > parent values in the table. For example,
> > code <- "x + (y + 1)"
> > p <- parse(text=code)
> > getParseData(p)
> > line1 col1 line2 col2 id parent token terminal text
> > 15 1 1 1 11 15 0 expr FALSE
> > 1 1 1 1 1 1 3 SYMBOL TRUE x
> > 3 1 1 1 1 3 15 expr FALSE
> > 2 1 3 1 3 2 15 '+' TRUE +
> > 13 1 5 1 11 13 15 expr FALSE
> > 4 1 5 1 5 4 13 '(' TRUE (
> > 11 1 6 1 10 11 13 expr FALSE
> > 5 1 6 1 6 5 7 SYMBOL TRUE y
> > 7 1 6 1 6 7 11 expr FALSE
> > 6 1 8 1 8 6 11 '+' TRUE +
> > 8 1 10 1 10 8 9 NUM_CONST TRUE 1
> > 9 1 10 1 10 9 11 expr FALSE
> > 10 1 11 1 11 10 13 ')' TRUE )
> > Now p is an expression, with the parse tree in p[]. From the table, we
> > can see that the root node has id 15, and 3 nodes have that as a parent.
> > Those would be p[[c(1,1)]], p[[c(1,2)]], p[[c(1,3)]]. The tricky part is
> > the re-ordering: those correspond to `+`, x, and (y+1) respectively, not
> > the order they appear in the original source or in the table. Generally the
> > function call appears first in the parse tree, but I'm not sure you could
> > always recognize which is the function call by looking at the table.
> > Duncan Murdoch
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel