[Rd] [WISH / PATCH] possibility to split string literals across multiple lines

Wed Jun 14 17:05:19 CEST 2017

-------- Original Message --------
From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
Sent: Wednesday, Jun 14, 2017 1:36 PM GMT
To: Andreas Kersting
Cc: r-devel
Subject: [Rd] [WISH / PATCH] possibility to split string literals across 
multiple lines

> On 14/06/2017 6:45 AM, Andreas Kersting wrote:
>> On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch
>> <murdoch.duncan at gmail.com> wrote:
>>
>>> On 14/06/2017 5:58 AM, Andreas Kersting wrote:
>>>> Hi,
>>>>
>>>> I would really like to have a way to split long string literals across
>>>> multiple lines in R.
>>>
>>> I don't understand why you require the string to be a literal.  Why not
>>> construct the long string in an expression like
>>>
>>>   paste0("aaa",
>>>          "bbb")
>>>
>>> ?  Surely the execution time of the paste0 call is negligible.
>>>
>>> Duncan Murdoch
>>
>> Actually "execution time" is precisely one of the reasons why I would
>> like to see this feature as - depending on the context (e.g. in a
>> tight loop) - the execution time of paste0 (or probably also glue,
>> thanks Gabor) is not necessarily insignificant.
>
> You also need to consider implementation time.  This is not just changes
> to R itself; trailing backslashes *are* used in some packages (e.g.
> geoparser), so those packages would need to be identified and modified
> and resubmitted to CRAN.

I am totally with you on this "runtime vs. implementation-time"-issue. 
That is why I proposed the patch as I did: It seemed to require only 
minor changes to base R and I didn't see how it could be incompatible 
with existing code.

Actually I can still not see how a package could have potentially *used* 
backslashes immediately followed by newlines up to now, since those 
backslashes were just ignored by the parser (And changes to the function 
StringValue are just about the parser, aren't they?). Of course I cannot 
rule out the possibility that there is code like
var <- "aaa\
bbb"
around, but this would be based on the undocumented(?) features that 
"backslash newline" is a valid escape sequence and that it is treated as 
"newline".

Maybe its a good idea to show some more examples how the patched parser 
behaves. There should only be difference to the current implementation 
if a string literal spans multiple lines and a line ends in an odd 
number of backslashes (see last example):

 > "aaa\\
+ bbb"
[1] "aaa\\\nbbb"

 > "aaa\\nbbb"
[1] "aaa\\nbbb"

 > "aaa\\\nbbb"
[1] "aaa\\\nbbb"

 > "aaa\\"
[1] "aaa\\"

 > "aaa\\\n"
[1] "aaa\\\n"

 > "aaa\\\\"
[1] "aaa\\\\"

 > "aaa\\\\\n"
[1] "aaa\\\\\n"

 > "aaa\\\\
+ bbb"
[1] "aaa\\\\\nbbb"

 > "aaa\\\
+ bbb"
[1] "aaa\\bbb"

Andreas

> Core changes to existing behaviour need really strong arguments, and I'm
> just not seeing those here.
>
> Duncan Murdoch
>
>> The other reason is style: I think it is cleaner if we can construct
>> such a long string literal without the need for a function call.
>>
>> Andreas
>>
>>>>
>>>> Currently, if a string literal spans multiple lines, there is no way to
>>>> inhibit the introduction of newline characters:
>>>>
>>>>  > "aaa
>>>> + bbb"
>>>> [1] "aaa\nbbb"
>>>>
>>>>
>>>> If a line ends with a backslash, it is just ignored:
>>>>
>>>>  > "aaa\
>>>> + bbb"
>>>> [1] "aaa\nbbb"
>>>>
>>>>
>>>> We could use this fact to implement string splitting in a fairly
>>>> backward-compatible way, since currently such trailing backslashes
>>>> should hardly be used as they do not have any effect. The attached
>>>> patch
>>>> makes the parser ignore a newline character directly following a
>>>> backslash:
>>>>
>>>>  > "aaa\
>>>> + bbb"
>>>> [1] "aaabbb"
>>>>
>>>>
>>>> I personally would also prefer if leading blanks (spaces and tabs) in
>>>> the second line are ignored to allow for proper indentation:
>>>>
>>>>  >   "aaa \
>>>> +    bbb"
>>>> [1] "aaa bbb"
>>>>
>>>>  >   "aaa\
>>>> +    \ bbb"
>>>> [1] "aaa bbb"
>>>>
>>>> This is also implemented by this patch.
>>>>
>>>>
>>>> An alternative approach could be to have something like
>>>>
>>>> ("aaa "
>>>> "bbb")
>>>>
>>>> or
>>>>
>>>> ("aaa ",
>>>> "bbb")
>>>>
>>>> be interpreted as "aaa bbb".
>>>>
>>>> I don't know the ins and outs of the parser of R (hence: please very
>>>> carefully review the attached patch), but I guess this would be more
>>>> work to implement!?
>>>>
>>>>
>>>> What do you think? Is there anybody else who is missing this feature in
>>>> the first place?
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>>
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>
>>
>
>