[Rd] Changes to parser in R-devel
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Jul 20 00:50:12 CEST 2012
On 12-07-19 4:41 PM, Yihui Xie wrote:
> I'm not sure if there is a bug somewhere; see this example:
There's definitely a bug in the handling of empty lists, such as the
empty list of commands in your first example and the empty list of
arguments in your second. There's a partial workaround currently in
R-devel, but not a perfect fix. (This is due to me missing a conversion
from Romain's 0-based column counting to the usual 1-based counting.)
I expect it will be fixed tomorrow, or sooner.
Duncan Murdoch
>
> getParseData(parse(text='function(x){}'))
>
> line1 col1 line2 col2 id parent token terminal text
> 1 1 1 1 8 1 11 FUNCTION TRUE function
> 2 1 9 1 9 2 11 '(' TRUE (
> 3 1 10 1 10 3 5 SYMBOL_FORMALS TRUE x
> 4 1 11 1 11 4 11 ')' TRUE )
> 5 1 12 1 12 6 8 '{' TRUE {
> 6 1 13 1 13 7 8 '}' TRUE }
> 7 1 12 1 12 5 11 '}' TRUE {
> 8 1 12 1 13 8 11 expr FALSE
> 9 1 1 1 13 11 0 expr FALSE
>
> I get an additional { in the 7th row of the 'text' column.
>
> Another problem is that for this empty function below, there will be
> an obvious pause if you run it more than once:
>
> getParseData(parse(text='function(){}'))
>
> and you may get wild line/col numbers like this:
>
> line1 col1 line2 col2 id parent token terminal text
> 1 1 1 1 8 1 9 FUNCTION TRUE function
> 2 1 9 1 9 2 9 '(' TRUE (
> 3 1 10 1 10 3 9 ')' TRUE )
> 4 1 11 1 11 4 6 '{' TRUE {
> 5 1 12 1 12 5 6 '}' TRUE }
> 6 320024 11 140106360 11 11 9 '}' TRUE
> 7 1 11 1 12 6 9 expr FALSE
> 8 1 1 1 12 9 11 expr FALSE
>
> What is worse is it can crash R:
>
> *** caught segfault ***
> address 0x9488c20, cause 'memory not mapped'
>
> Traceback:
> 1: parse(text = "function(){}")
> 2: getSrcref(x)
> 3: getSrcfile(x)
> 4: getParseData(parse(text = "function(){}"))
>
>
>> sessionInfo()
> R Under development (unstable) (2012-07-18 r59904)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>
>
> On Wed, Jul 18, 2012 at 2:31 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> I have just committed (in r59883) some changes to the R parser based on
>> Romain Francois' parser package. Packages that made use of parser will
>> hopefully find that the information in base R gives them what they need to
>> work with, but the data is not identical to
>> what parser recorded (since it was not consistent with some things already
>> in R). One reason for the change was that the parser in the parser package
>> was slightly different than the one in R; the hope is that by providing the
>> services in R, it will make maintenance easier for things like code
>> analysis, pretty printing, etc.
>>
>> See ?getParseData for details, and if you are maintaining a package that
>> depends on parser, feel free to ask me for help in the transition, or make
>> suggestions for changes if I've done something that causes you too much
>> trouble.
>>
>> Duncan Murdoch
>>
>> P.S. to Qiang Li: as mentioned privately, the goal for this change was to
>> reproduce output equivalent to what parser did, so I have not incorporated
>> your suggested change to outlaw expressions like "x[[1] ]" (with an
>> embedded space where it shouldn't be). After things settle down we can
>> consider that change and others.
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list