[Rd] Inconsistent Parse Behavior
brodie gaslam
brodie.gaslam at yahoo.com
Wed Dec 24 15:00:26 CET 2014
Under some specific conditions, `parse` seems to produce inconsistent and potentially incorrect results the first time it is run in a fresh clean R session. Consider this code where we parse the same text twice in a row, and get one value in the parse data that is mismatched:
```Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> txt <- 'c("", {
+ c(integer(3L), 1:3)
+ c(integer(), 1:3, 1L) # TRUE
+ c(integer(), c(1, 2, 3), 1L) # TRUE
+ } )
+ c("", {
+ lst <- list(list( 1, 2), list( 3, list( 4, list( 5, list(6, 6.1, 6.2)))))
+ } )
+ c("", {
+ TRUE
+ } )'
> prs1 <- parse(text=txt, keep.source=TRUE)
> prs2 <- parse(text=txt, keep.source=TRUE)
> which(attr(prs1, "srcfile")$parseData != attr(prs2, "srcfile")$parseData)
[1] 1176
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
```This discrepancy does not happen if I simplify the code to parse in any way. The code as it is is a much simplified version of the code that first produced the error for me. I cannot reduce it further without also eliminating the error.
Unfortunately, the discrepancy is meaningful. The problem is the first parse. Looking at `getParseData` output:```> subset(getParseData(prs1), id %in% c(226, 234))
line1 col1 line2 col2 id parent token terminal text
226 6 1 8 3 226 234 expr FALSE
234 9 5 9 5 234 251 ',' TRUE ,```Notice how item 226 has for parent item 234 that starts on line 9, col 5, after item 226 ends. I'm not sure how this is possible.
In the second parse, the parse data is as one would expect:```> subset(getParseData(prs2), id == 226)
line1 col1 line2 col2 id parent token terminal text
226 6 1 8 3 226 0 expr FALSE
```The parent here is the top level (0), as would be expected looking at the source code in `txt` (226 represents the second `c(...)` block).
I suspect the problem is caused by the use of `{}` inside of `f()`, but again, it is not that simple since any further simplification of my code above seems to resolve the problem. I also don't know why it would work fine the second time, though there must be some state initialization inside the parser going on.
Any help appreciated.
Best,
Brodie
[[alternative HTML version deleted]]
More information about the R-devel
mailing list