[Rd] String interpolation [Was: string concatenation operator (revisited)]

Taras Zakharko t@r@@@z@kh@rko @end|ng |rom uzh@ch
Wed Dec 8 08:55:13 CET 2021


> I don't think a custom type alone would work, because users would expect to use such string anywhere a regular string can be used, and that's where the problems start - the evaluation would have to happen at a point where it is not expected since we can assume today that CHAR() doesn't evaluate. If it's just construct that needs some function call to turn it into a real string, then that's (from user's perspective) no different than glue() 
> 

Oh, it will be still evaluated as expected. It would just be a new type of language expression, just like byte code or call or a promise. You just need a new case in the switch statement of eval(). The rest is just lazy evaluation as usual, no change of rules is needed. Of course, some rules need to be established on when exactly the evaluation kicks in  (and this can be a bit tricky), but I am sure one can figure out a sane approach — my intuition would be to evaluate a format string any time one evaluates a promise. In fact, it could probably be treated as a special type of promise itself, with value caching and all. Under which approach the end user will never see the special type, every time you assign a formatted string somewhere, it will get evaluated to a plain old character vector. But if passed as an argument you get the benefits of lazy evaluation. 

What functions could do is suspend the evaluation to check if an argument is a (processed) format string and apply custom formatting to it. Again, not any different from today’s R, where  you can capture the lazy expression and apply transformations to it. The R parser just does some basic preprocessing for you. 

> admittedly, you could do a lot more with such internal type, but not sure if the complexity is worth it

That’s the question :) I am not sure either. It was just a spontaneous idea I thew out there, not a result of careful deliberation. Still, I believe it can be useful to think about things like that, it just might give the right person just the right idea. 


> For what it's worth, you can also get 90% of the way there with:
> 
>    f <- glue::glue
>    f("if you squint, this is a Python f-string”)
> 
> ...
> 
> That said, if something like this were to happen in R, my vote would
> be an implementation in the parser that transformed f"string" into
> something like 'interpolate("string")', so that f"string" would just
> become syntactic sugar for already-existing code

Not really. With this approach expression parsing would still be done at evaluation time, so you don’t get any  of the potential benefits that come from my suggestion (expression parsing at parse time, higher runtime performance, correctly captured expression promises). 

One quick note about parser transformations: lazy evaluation with expression capturing (substitution) is one of unique strength of R, as it allows one to trivially implement powerful DLSs on top of the language (as demonstrated by “tidy evaluation” implementation in tidyverse).  Parser transformations might make the implementation simpler, but they remove the  information from the parse tree and reduce opportunities. 

— Taras


> On 8 Dec 2021, at 00:13, Kevin Ushey <kevinushey using gmail.com> wrote:
> 
> For what it's worth, you can also get 90% of the way there with:
> 
>    f <- glue::glue
>    f("if you squint, this is a Python f-string")
> 
> Having this in an add-on package also makes it much easier to change
> in response to user feedback; R packages have more freedom to make
> backwards-incompatible changes.
> 
> That said, if something like this were to happen in R, my vote would
> be an implementation in the parser that transformed f"string" into
> something like 'interpolate("string")', so that f"string" would just
> become syntactic sugar for already-existing code (and so such code
> could remain debuggable, easy to reason about, etc without any changes
> to R internals)
> 
> Thanks,
> Kevin
> 
> On Tue, Dec 7, 2021 at 2:06 PM Simon Urbanek
> <simon.urbanek using r-project.org> wrote:
>> 
>> I don't think a custom type alone would work, because users would expect to use such string anywhere a regular string can be used, and that's where the problems start - the evaluation would have to happen at a point where it is not expected since we can assume today that CHAR() doesn't evaluate. If it's just construct that needs some function call to turn it into a real string, then that's (from user's perspective) no different than glue() so I don't think the users would see the benefit (admittedly, you could do a lot more with such internal type, but not sure if the complexity is worth it).
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Dec 8, 2021, at 12:56 AM, Taras Zakharko <taras.zakharko using uzh.ch> wrote:
>>> 
>>> I fully agree! General string interpolation opens a gaping security hole and is accompanied by all kinds of problems and decisions. What I envision instead is something like this:
>>> 
>>>  f”hello {name}”
>>> 
>>> Which gets parsed by R to this:
>>> 
>>>  (STRINTERPSXP (CHARSXP (PROMISE nil)))
>>> 
>>> Basically, a new type of R language construct that still can be processed by packages (for customized interpolation like in cli etc.), with a default eval which is basically paste0(). The benefit here would be that this is eagerly parsed and syntactically checked, and that the promise code could carry a srcref. And of course, that you could pass an interpolated string expression lazily between frames without losing the environment etc… For more advanced applications, a low level string interpolation expression constructor could be provided (that could either parse a general string — at the user’s risk, or build it directly from expressions).
>>> 
>>> — Taras
>>> 
>>> 


	[[alternative HTML version deleted]]



More information about the R-devel mailing list