[Rd] Changes to parser in R-devel

Thu Jul 19 22:41:12 CEST 2012

I'm not sure if there is a bug somewhere; see this example:

getParseData(parse(text='function(x){}'))

  line1 col1 line2 col2 id parent          token terminal     text
1     1    1     1    8  1     11       FUNCTION     TRUE function
2     1    9     1    9  2     11            '('     TRUE        (
3     1   10     1   10  3      5 SYMBOL_FORMALS     TRUE        x
4     1   11     1   11  4     11            ')'     TRUE        )
5     1   12     1   12  6      8            '{'     TRUE        {
6     1   13     1   13  7      8            '}'     TRUE        }
7     1   12     1   12  5     11            '}'     TRUE        {
8     1   12     1   13  8     11           expr    FALSE
9     1    1     1   13 11      0           expr    FALSE

I get an additional { in the 7th row of the 'text' column.

Another problem is that for this empty function below, there will be
an obvious pause if you run it more than once:

getParseData(parse(text='function(){}'))

and you may get wild line/col numbers like this:

   line1 col1     line2 col2 id parent    token terminal     text
1      1    1         1    8  1      9 FUNCTION     TRUE function
2      1    9         1    9  2      9      '('     TRUE        (
3      1   10         1   10  3      9      ')'     TRUE        )
4      1   11         1   11  4      6      '{'     TRUE        {
5      1   12         1   12  5      6      '}'     TRUE        }
6 320024   11 140106360   11 11      9      '}'     TRUE
7      1   11         1   12  6      9     expr    FALSE
8      1    1         1   12  9     11     expr    FALSE

What is worse is it can crash R:

 *** caught segfault ***
address 0x9488c20, cause 'memory not mapped'

Traceback:
 1: parse(text = "function(){}")
 2: getSrcref(x)
 3: getSrcfile(x)
 4: getParseData(parse(text = "function(){}"))

> sessionInfo()
R Under development (unstable) (2012-07-18 r59904)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA

On Wed, Jul 18, 2012 at 2:31 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> I have just committed (in r59883) some changes to the R parser based on
> Romain Francois' parser package.  Packages that made use of parser will
> hopefully find that the information in base R gives them what they need to
> work with, but the data is not identical to
> what parser recorded (since it was not consistent with some things already
> in R).  One reason for the change was that the parser in the parser package
> was slightly different than the one in R; the hope is that by providing the
> services in R, it will make maintenance easier for things like code
> analysis, pretty printing, etc.
>
> See ?getParseData for details, and if you are maintaining a package that
> depends on parser, feel free to ask me for help in the transition, or make
> suggestions for changes if I've done something that causes you too much
> trouble.
>
> Duncan Murdoch
>
> P.S. to Qiang Li:  as mentioned privately, the goal for this change was to
> reproduce output equivalent to what parser did, so I have not incorporated
> your suggested change to outlaw expressions like "x[[1] ]"  (with an
> embedded space where it shouldn't be).  After things settle down we can
> consider that change and others.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel