[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Sat Dec 14 18:04:20 CET 2019


Laurent,

the main point here is that ParseVector() just like any other R API has to be called in a correct context since it can raise errors so the issue was that your C code has a bug of not setting R correctly (my guess would be your'e not creating the initial context necessary in embedded R). There are many different errors, your is just one of many that can occur - any R API call that does allocation (and parsing obviously does) can cause errors. Note that this is true for pretty much all R API functions.

Cheers,
Simon



> On Dec 14, 2019, at 11:25 AM, Laurent Gautier <lgautier using gmail.com> wrote:
> 
> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <tomas.kalibera using gmail.com> a
> écrit :
> 
>> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>> 
>> 
>> 
>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com> a
>> écrit :
>> 
>>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>> 
>>> Thanks for the quick response Tomas.
>>> 
>>> The same error is indeed happening when trying to have a zero-length
>>> variable name in an environment. The surprising bit is then "why is this
>>> happening during parsing" (that is why are variables assigned to an
>>> environment) ?
>>> 
>>> The emitted R error (in the R console) is not a parse (syntax) error, but
>>> an error emitted during parsing when the parser tries to intern a name -
>>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>>> and hence the error. In the call "list(''=1)" , the empty name is what
>>> could eventually become a name of a local variable inside list(), even
>>> though not yet during parsing.
>>> 
>> 
>> Thanks Tomas.
>> 
>> I guess this has do with R expressions being lazily evaluated, and names
>> of arguments in a call are also part of the expression. Now the puzzling
>> part is why is that at all part of the parsing: I would have expected
>> R_ParseVector() to be restricted to parsing... Now it feels like
>> R_ParseVector() is performing parsing, and a first level of evalution for
>> expressions that "should never work" (the empty name).
>> 
>> Think of it as an exception in say Python. Some failures during parsing
>> result in an exception (called error in R and implemented using a long
>> jump). Any time you are calling into R you can get an error; out of memory
>> is also signalled as R error.
>> 
> 
> 
> The surprising bit for me was that I had expected the function to solely
> perform parsing. I did expect an exception (and a jmp smashing the stack)
> when the function concerned is in the C-API, is parsing a string, and is
> using a parameter (pointer) to store whether parsing was a failure or a
> success.
> 
> Since you are making a comparison with Python, the distinction I am making
> between parsing and evaluation seem to apply there. For example:
> 
> ```
>>>> import parser
>>>> parser.expr('1+')
>  Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "<string>", line 1
>    1+
>     ^
> SyntaxError: unexpected EOF while parsing
>>>> p = parser.expr('list(""=1)')
>>>> p
> <parser.st at 0x7f360e5329f0>
>>>> eval(p)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: eval() arg 1 must be a string, bytes or code object
> 
>>>> list(""=1)
>  File "<stdin>", line 1
> SyntaxError: keyword can't be an expression
> ```
> 
> 
>> There is probably some error in how the external code is handling R
>>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>>> and possibly also how R is initialized before calling ParseVector. Probably
>>> you would get the same problem when running say "stop('myerror')". Please
>>> note R errors are implemented as long-jumps, so care has to be taken when
>>> calling into R, Writing R Extensions has more details (and section 8
>>> specifically about embedding R). This is unlike parse (syntax) errors
>>> signaled via return value to ParseVector()
>>> 
>> 
>> The issue is that the segfault (because of stack smashing, therefore
>> because of what also suspected to be an incontrolled jump) is happening
>> within the execution of R_ParseVector(). I would think that an issue with
>> the initialization of R is less likely because the project is otherwise
>> used a fair bit and is well covered by automated continuous tests.
>> 
>> After looking more into R's gram.c I suspect that an execution context is
>> required for R_ParseVector() to know to properly work (know where to jump
>> in case of error) when the parsing code decides to fail outside what it
>> thinks is a syntax error. If the case, this would make R_ParseVector()
>> function well when called from say, a C-extension to an R package, but fail
>> the way I am seeing it fail when called from an embedded R.
>> 
>> Yes, contexts are used internally to handle errors. For external use
>> please see Writing R Extensions, section 6.12.
>> 
> 
> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
> is seems to help me overcome the issue. Thanks for the pointer.
> 
> Best,
> 
> 
> Laurent
> 
> 
>> Best
>> Tomas
>> 
>> 
>> Best,
>> 
>> Laurent
>> 
>>> Best,
>>> Tomas
>>> 
>>> 
>>> We are otherwise aware that the error is not occurring in the R console,
>>> but can be traced to a call to R_ParseVector() in R's C API:(
>>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>>> ).
>>> 
>>> Our specific setup is calling an embedded R from Python, using the cffi
>>> library. An error on end was the first possibility considered, but the
>>> puzzling specificity of the error (as shown below other parsing errors are
>>> handled properly) and the difficulty tracing what is in happening in
>>> R_ParseVector() made me ask whether someone on this list had a suggestion
>>> about the possible issue"
>>> 
>>> ```
>>> 
>>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
>>> R[write to console]: Fatal error: unable to initialize the JIT
>>> 
>>> *** stack smashing detected ***: <unknown> terminated
>>> ```
>>> 
>>> 
>>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalibera using gmail.com> a
>>> écrit :
>>> 
>>>> Dear Laurent,
>>>> 
>>>> could you please provide a complete reproducible example where parsing
>>>> results in a crash of R? Calling parse(text="list(''=123") from R works
>>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>> 
>>>> I don't think the problem you observed could be related to the memory
>>>> leak. The leak is on the heap, not stack.
>>>> 
>>>> Zero-length names of elements in a list are allowed. They are not the
>>>> same thing as zero-length variables in an environment. If you try to
>>>> convert "lst" from your example to an environment, you would get the
>>>> error (attempt to use zero-length variable name).
>>>> 
>>>> Best
>>>> Tomas
>>>> 
>>>> 
>>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>>>> Hi again,
>>>>> 
>>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>>> of
>>>>> zero-length named elements does not seem consistent either:
>>>>> 
>>>>> ```
>>>>>> lst <- list()
>>>>>> lst[[""]] <- 1
>>>>>> names(lst)
>>>>> [1] ""
>>>>>> list("" = 1)
>>>>> Error: attempt to use zero-length variable name
>>>>> ```
>>>>> 
>>>>> Should the parser be made to accept as valid what is otherwise possible
>>>>> when using `[[<` ?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Laurent
>>>>> 
>>>>> 
>>>>> 
>>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgautier using gmail.com> a
>>>> écrit :
>>>>> 
>>>>>> I found the following code comment in `src/main/gram.c`:
>>>>>> 
>>>>>> ```
>>>>>> 
>>>>>> /* Memory leak
>>>>>> 
>>>>>> yyparse(), as generated by bison, allocates extra space for the parser
>>>>>> stack using malloc(). Unfortunately this means that there is a memory
>>>>>> leak in case of an R error (long-jump). In principle, we could define
>>>>>> yyoverflow() to relocate the parser stacks for bison and allocate say
>>>> on
>>>>>> the R heap, but yyoverflow() is undocumented and somewhat complicated
>>>>>> (we would have to replicate some macros from the generated parser
>>>> here).
>>>>>> The same problem exists at least in the Rd and LaTeX parsers in tools.
>>>>>> */
>>>>>> 
>>>>>> ```
>>>>>> 
>>>>>> Could this be related to be issue ?
>>>>>> 
>>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgautier using gmail.com> a
>>>>>> écrit :
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> The behavior of
>>>>>>> ```
>>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>>>>>> ```
>>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>>>>>> depending on the string to be parsed.
>>>>>>> 
>>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
>>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
>>>>>>> `"list(''=123"` will result in R sending a message to the console
>>>> (followed but a crash):
>>>>>>> 
>>>>>>> ```
>>>>>>> R[write to console]: Error: attempt to use zero-length variable
>>>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>>>> smashing detected ***: <unknown> terminated
>>>>>>> ```
>>>>>>> 
>>>>>>> Is there a reason for the difference in behavior, and is there a
>>>> workaround ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> 
>>>>>>> Laurent
>>>>>>> 
>>>>>>> 
>>>>>      [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list