[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Laurent Gautier |g@ut|er @end|ng |rom gm@||@com
Sat Dec 14 23:29:50 CET 2019


Hi Simon,

Widespread errors would have caught my earlier as the way that code is
using only one initialization of the embedded R, is used quite a bit, and
is covered by quite a few unit tests. This is the only situation I am aware
of in which an error occurs.

What is a "correct context", or initial context, the code should from ?
Searching for "context" in the R-exts manual does not return much.

Best,

Laurent


Le sam. 14 déc. 2019 à 12:20, Simon Urbanek <simon.urbanek using r-project.org> a
écrit :

> Laurent,
>
> the main point here is that ParseVector() just like any other R API has to
> be called in a correct context since it can raise errors so the issue was
> that your C code has a bug of not setting R correctly (my guess would be
> your'e not creating the initial context necessary in embedded R). There are
> many different errors, your is just one of many that can occur - any R API
> call that does allocation (and parsing obviously does) can cause errors.
> Note that this is true for pretty much all R API functions.
>
> Cheers,
> Simon
>
>
>
> > On Dec 14, 2019, at 11:25 AM, Laurent Gautier <lgautier using gmail.com>
> wrote:
> >
> > Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <tomas.kalibera using gmail.com> a
> > écrit :
> >
> >> On 12/9/19 2:54 PM, Laurent Gautier wrote:
> >>
> >>
> >>
> >> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com>
> a
> >> écrit :
> >>
> >>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
> >>>
> >>> Thanks for the quick response Tomas.
> >>>
> >>> The same error is indeed happening when trying to have a zero-length
> >>> variable name in an environment. The surprising bit is then "why is
> this
> >>> happening during parsing" (that is why are variables assigned to an
> >>> environment) ?
> >>>
> >>> The emitted R error (in the R console) is not a parse (syntax) error,
> but
> >>> an error emitted during parsing when the parser tries to intern a name
> -
> >>> look it up in a symbol table. Empty string is not allowed as a symbol
> name,
> >>> and hence the error. In the call "list(''=1)" , the empty name is what
> >>> could eventually become a name of a local variable inside list(), even
> >>> though not yet during parsing.
> >>>
> >>
> >> Thanks Tomas.
> >>
> >> I guess this has do with R expressions being lazily evaluated, and names
> >> of arguments in a call are also part of the expression. Now the puzzling
> >> part is why is that at all part of the parsing: I would have expected
> >> R_ParseVector() to be restricted to parsing... Now it feels like
> >> R_ParseVector() is performing parsing, and a first level of evalution
> for
> >> expressions that "should never work" (the empty name).
> >>
> >> Think of it as an exception in say Python. Some failures during parsing
> >> result in an exception (called error in R and implemented using a long
> >> jump). Any time you are calling into R you can get an error; out of
> memory
> >> is also signalled as R error.
> >>
> >
> >
> > The surprising bit for me was that I had expected the function to solely
> > perform parsing. I did expect an exception (and a jmp smashing the stack)
> > when the function concerned is in the C-API, is parsing a string, and is
> > using a parameter (pointer) to store whether parsing was a failure or a
> > success.
> >
> > Since you are making a comparison with Python, the distinction I am
> making
> > between parsing and evaluation seem to apply there. For example:
> >
> > ```
> >>>> import parser
> >>>> parser.expr('1+')
> >  Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File "<string>", line 1
> >    1+
> >     ^
> > SyntaxError: unexpected EOF while parsing
> >>>> p = parser.expr('list(""=1)')
> >>>> p
> > <parser.st at 0x7f360e5329f0>
> >>>> eval(p)
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> > TypeError: eval() arg 1 must be a string, bytes or code object
> >
> >>>> list(""=1)
> >  File "<stdin>", line 1
> > SyntaxError: keyword can't be an expression
> > ```
> >
> >
> >> There is probably some error in how the external code is handling R
> >>> errors  (Fatal error: unable to initialize the JIT, stack smashing,
> etc)
> >>> and possibly also how R is initialized before calling ParseVector.
> Probably
> >>> you would get the same problem when running say "stop('myerror')".
> Please
> >>> note R errors are implemented as long-jumps, so care has to be taken
> when
> >>> calling into R, Writing R Extensions has more details (and section 8
> >>> specifically about embedding R). This is unlike parse (syntax) errors
> >>> signaled via return value to ParseVector()
> >>>
> >>
> >> The issue is that the segfault (because of stack smashing, therefore
> >> because of what also suspected to be an incontrolled jump) is happening
> >> within the execution of R_ParseVector(). I would think that an issue
> with
> >> the initialization of R is less likely because the project is otherwise
> >> used a fair bit and is well covered by automated continuous tests.
> >>
> >> After looking more into R's gram.c I suspect that an execution context
> is
> >> required for R_ParseVector() to know to properly work (know where to
> jump
> >> in case of error) when the parsing code decides to fail outside what it
> >> thinks is a syntax error. If the case, this would make R_ParseVector()
> >> function well when called from say, a C-extension to an R package, but
> fail
> >> the way I am seeing it fail when called from an embedded R.
> >>
> >> Yes, contexts are used internally to handle errors. For external use
> >> please see Writing R Extensions, section 6.12.
> >>
> >
> > I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and
> this
> > is seems to help me overcome the issue. Thanks for the pointer.
> >
> > Best,
> >
> >
> > Laurent
> >
> >
> >> Best
> >> Tomas
> >>
> >>
> >> Best,
> >>
> >> Laurent
> >>
> >>> Best,
> >>> Tomas
> >>>
> >>>
> >>> We are otherwise aware that the error is not occurring in the R
> console,
> >>> but can be traced to a call to R_ParseVector() in R's C API:(
> >>>
> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
> >>> ).
> >>>
> >>> Our specific setup is calling an embedded R from Python, using the cffi
> >>> library. An error on end was the first possibility considered, but the
> >>> puzzling specificity of the error (as shown below other parsing errors
> are
> >>> handled properly) and the difficulty tracing what is in happening in
> >>> R_ParseVector() made me ask whether someone on this list had a
> suggestion
> >>> about the possible issue"
> >>>
> >>> ```
> >>>
> >>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e =
> ri.parse("list(''=1+")
> ---------------------------------------------------------------------------RParsingError
>                            Traceback (most recent call last)>>> e =
> ri.parse("list(''=123") R[write to console]: Error: attempt to use
> zero-length variable name
> >>> R[write to console]: Fatal error: unable to initialize the JIT
> >>>
> >>> *** stack smashing detected ***: <unknown> terminated
> >>> ```
> >>>
> >>>
> >>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalibera using gmail.com>
> a
> >>> écrit :
> >>>
> >>>> Dear Laurent,
> >>>>
> >>>> could you please provide a complete reproducible example where parsing
> >>>> results in a crash of R? Calling parse(text="list(''=123") from R
> works
> >>>> fine for me (gives Error: attempt to use zero-length variable name).
> >>>>
> >>>> I don't think the problem you observed could be related to the memory
> >>>> leak. The leak is on the heap, not stack.
> >>>>
> >>>> Zero-length names of elements in a list are allowed. They are not the
> >>>> same thing as zero-length variables in an environment. If you try to
> >>>> convert "lst" from your example to an environment, you would get the
> >>>> error (attempt to use zero-length variable name).
> >>>>
> >>>> Best
> >>>> Tomas
> >>>>
> >>>>
> >>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
> >>>>> Hi again,
> >>>>>
> >>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
> >>>> of
> >>>>> zero-length named elements does not seem consistent either:
> >>>>>
> >>>>> ```
> >>>>>> lst <- list()
> >>>>>> lst[[""]] <- 1
> >>>>>> names(lst)
> >>>>> [1] ""
> >>>>>> list("" = 1)
> >>>>> Error: attempt to use zero-length variable name
> >>>>> ```
> >>>>>
> >>>>> Should the parser be made to accept as valid what is otherwise
> possible
> >>>>> when using `[[<` ?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Laurent
> >>>>>
> >>>>>
> >>>>>
> >>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgautier using gmail.com> a
> >>>> écrit :
> >>>>>
> >>>>>> I found the following code comment in `src/main/gram.c`:
> >>>>>>
> >>>>>> ```
> >>>>>>
> >>>>>> /* Memory leak
> >>>>>>
> >>>>>> yyparse(), as generated by bison, allocates extra space for the
> parser
> >>>>>> stack using malloc(). Unfortunately this means that there is a
> memory
> >>>>>> leak in case of an R error (long-jump). In principle, we could
> define
> >>>>>> yyoverflow() to relocate the parser stacks for bison and allocate
> say
> >>>> on
> >>>>>> the R heap, but yyoverflow() is undocumented and somewhat
> complicated
> >>>>>> (we would have to replicate some macros from the generated parser
> >>>> here).
> >>>>>> The same problem exists at least in the Rd and LaTeX parsers in
> tools.
> >>>>>> */
> >>>>>>
> >>>>>> ```
> >>>>>>
> >>>>>> Could this be related to be issue ?
> >>>>>>
> >>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgautier using gmail.com>
> a
> >>>>>> écrit :
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> The behavior of
> >>>>>>> ```
> >>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> >>>>>>> ```
> >>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> >>>>>>> depending on the string to be parsed.
> >>>>>>>
> >>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
> >>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
> >>>>>>> `"list(''=123"` will result in R sending a message to the console
> >>>> (followed but a crash):
> >>>>>>>
> >>>>>>> ```
> >>>>>>> R[write to console]: Error: attempt to use zero-length variable
> >>>> nameR[write to console]: Fatal error: unable to initialize the JIT***
> stack
> >>>> smashing detected ***: <unknown> terminated
> >>>>>>> ```
> >>>>>>>
> >>>>>>> Is there a reason for the difference in behavior, and is there a
> >>>> workaround ?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>>
> >>>>>>> Laurent
> >>>>>>>
> >>>>>>>
> >>>>>      [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list