[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Mon Dec 9 15:57:53 CET 2019
On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com
> <mailto:tomas.kalibera using gmail.com>> a écrit :
>
> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>> Thanks for the quick response Tomas.
>>
>> The same error is indeed happening when trying to have a
>> zero-length variable name in an environment. The surprising bit
>> is then "why is this happening during parsing" (that is why are
>> variables assigned to an environment) ?
>
> The emitted R error (in the R console) is not a parse (syntax)
> error, but an error emitted during parsing when the parser tries
> to intern a name - look it up in a symbol table. Empty string is
> not allowed as a symbol name, and hence the error. In the call
> "list(''=1)" , the empty name is what could eventually become a
> name of a local variable inside list(), even though not yet during
> parsing.
>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and
> names of arguments in a call are also part of the expression. Now the
> puzzling part is why is that at all part of the parsing: I would have
> expected R_ParseVector() to be restricted to parsing... Now it feels
> like R_ParseVector() is performing parsing, and a first level of
> evalution for expressions that "should never work" (the empty name).
Think of it as an exception in say Python. Some failures during parsing
result in an exception (called error in R and implemented using a long
jump). Any time you are calling into R you can get an error; out of
memory is also signalled as R error.
>
> There is probably some error in how the external code is handling
> R errors (Fatal error: unable to initialize the JIT, stack
> smashing, etc) and possibly also how R is initialized before
> calling ParseVector. Probably you would get the same problem when
> running say "stop('myerror')". Please note R errors are
> implemented as long-jumps, so care has to be taken when calling
> into R, Writing R Extensions has more details (and section 8
> specifically about embedding R). This is unlike parse (syntax)
> errors signaled via return value to ParseVector()
>
>
> The issue is that the segfault (because of stack smashing, therefore
> because of what also suspected to be an incontrolled jump) is
> happening within the execution of R_ParseVector(). I would think that
> an issue with the initialization of R is less likely because the
> project is otherwise used a fair bit and is well covered by automated
> continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context
> is required for R_ParseVector() to know to properly work (know where
> to jump in case of error) when the parsing code decides to fail
> outside what it thinks is a syntax error. If the case, this would make
> R_ParseVector() function well when called from say, a C-extension to
> an R package, but fail the way I am seeing it fail when called from an
> embedded R.
Yes, contexts are used internally to handle errors. For external use
please see Writing R Extensions, section 6.12.
Best
Tomas
> Best,
>
> Laurent
>
> Best,
> Tomas
>
>>
>> We are otherwise aware that the error is not occurring in the R
>> console, but can be traced to a call to R_ParseVector() in R's C
>> API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>>
>> Our specific setup is calling an embedded R from Python, using
>> the cffi library. An error on end was the first possibility
>> considered, but the puzzling specificity of the error (as shown
>> below other parsing errors are handled properly) and the
>> difficulty tracing what is in happening in R_ParseVector() made
>> me ask whether someone on this list had a suggestion about the
>> possible issue"
>>
>> ```
>> >>> import rpy2.rinterface as ri
>> >>> ri.initr()
>> >>> e = ri.parse("list(''=1+")
>> ---------------------------------------------------------------------------
>> RParsingError Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error:
>> attempt to use zero-length variable name R[write to console]:
>> Fatal error: unable to initialize the JIT *** stack smashing
>> detected ***: <unknown> terminated ```
>>
>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera
>> <tomas.kalibera using gmail.com <mailto:tomas.kalibera using gmail.com>> a
>> écrit :
>>
>> Dear Laurent,
>>
>> could you please provide a complete reproducible example
>> where parsing
>> results in a crash of R? Calling parse(text="list(''=123")
>> from R works
>> fine for me (gives Error: attempt to use zero-length variable
>> name).
>>
>> I don't think the problem you observed could be related to
>> the memory
>> leak. The leak is on the heap, not stack.
>>
>> Zero-length names of elements in a list are allowed. They are
>> not the
>> same thing as zero-length variables in an environment. If you
>> try to
>> convert "lst" from your example to an environment, you would
>> get the
>> error (attempt to use zero-length variable name).
>>
>> Best
>> Tomas
>>
>>
>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>> > Hi again,
>> >
>> > Beside R_ParseVector()'s possible inconsistent behavior,
>> R's handling of
>> > zero-length named elements does not seem consistent either:
>> >
>> > ```
>> >> lst <- list()
>> >> lst[[""]] <- 1
>> >> names(lst)
>> > [1] ""
>> >> list("" = 1)
>> > Error: attempt to use zero-length variable name
>> > ```
>> >
>> > Should the parser be made to accept as valid what is
>> otherwise possible
>> > when using `[[<` ?
>> >
>> >
>> > Best,
>> >
>> > Laurent
>> >
>> >
>> >
>> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier
>> <lgautier using gmail.com <mailto:lgautier using gmail.com>> a écrit :
>> >
>> >> I found the following code comment in `src/main/gram.c`:
>> >>
>> >> ```
>> >>
>> >> /* Memory leak
>> >>
>> >> yyparse(), as generated by bison, allocates extra space
>> for the parser
>> >> stack using malloc(). Unfortunately this means that there
>> is a memory
>> >> leak in case of an R error (long-jump). In principle, we
>> could define
>> >> yyoverflow() to relocate the parser stacks for bison and
>> allocate say on
>> >> the R heap, but yyoverflow() is undocumented and somewhat
>> complicated
>> >> (we would have to replicate some macros from the generated
>> parser here).
>> >> The same problem exists at least in the Rd and LaTeX
>> parsers in tools.
>> >> */
>> >>
>> >> ```
>> >>
>> >> Could this be related to be issue ?
>> >>
>> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier
>> <lgautier using gmail.com <mailto:lgautier using gmail.com>> a
>> >> écrit :
>> >>
>> >>> Hi,
>> >>>
>> >>> The behavior of
>> >>> ```
>> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>> >>> ```
>> >>> defined in `src/include/R_ext/Parse.h` appears to be
>> inconsistent
>> >>> depending on the string to be parsed.
>> >>>
>> >>> Trying to parse a string such as `"list(''=1+"` sets the
>> >>> `ParseStatus` to incomplete parsing error but trying to parse
>> >>> `"list(''=123"` will result in R sending a message to the
>> console (followed but a crash):
>> >>>
>> >>> ```
>> >>> R[write to console]: Error: attempt to use zero-length
>> variable nameR[write to console]: Fatal error: unable to
>> initialize the JIT*** stack smashing detected ***: <unknown>
>> terminated
>> >>> ```
>> >>>
>> >>> Is there a reason for the difference in behavior, and is
>> there a workaround ?
>> >>>
>> >>> Thanks,
>> >>>
>> >>>
>> >>> Laurent
>> >>>
>> >>>
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel using r-project.org <mailto:R-devel using r-project.org>
>> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list