[Rd] Multi-line string constants: proposed patch

Kevin Wright kwright at eskimo.com
Fri Sep 10 22:30:21 CEST 2004


R 1.9.1 requires multi-line strings to contain a backslash at the
end of each line (except the last line).  As noted by Mark
Bravington (http://tolstoy.newcastle.edu.au/R/help/02b/5199.html)
this requirement appears to be undocumented.

In S-Plus 6.2, multi-line strings do not need a backslash for continuation.

I recently (http://tolstoy.newcastle.edu.au/R/devel/04b/0256.html)
requested compatability with S-Plus and was told to contribute
a patch and then it would be considered.  Here is the proposed patch.
  
In the files src/main/gram.y and src/main/gram.c strings
are parsed with the StringValue function.  Looking at the function it is
clear that a newline character (not the two-byte '\n') generates an error:

static int StringValue(int c)
{
...
	if (c == '\n') {
	    xxungetc(c);
	    return ERROR;
	}

...
}

I tracked this code down and Mark Bravington confirmed (by building r-devel 
on Windows) that commenting out the four lines that start with
  if (c == '\n')
will allow R to handle multi-line strings either with or without backslashes
for continuation.  A 'diff' appears at the end of this mesage.

Note that if EOF is encountered while R thinks it's reading a string, it 
will silently add the string terminator rather than causing an error. 
I can't really see this as undesirable but I suppose we should mention it. 
(Currently the same thing happens if the last character of the last line is 
a backslash, so it's "consistent" anyway.)

I've searched through S-Plus and R documentation.  Here are few relevant
texts:

S-Plus 6.2 Programmer's Guide, Page 11
"character strings [are] enclosed by double quotes or apostrophes"

S-Plus 6.2 Programmer's Guide, Page 947 (abbreviated)
"Strings consist of zero or more characters typed between
two apostrophes or double quotes.  Table 23.2 lists some special
characters for use in string literals.  These special characters are
for string control, obtaining characters that are not represented on
the keyboard, or delimiting character strings.
  \t tab
  \\ backslash
  \n newline

R Language Definition
String constants are delimited by a pair of single (') or double (")
quotes and can contain all other printable characters.  Quotes and
other special characters within strings are specified using escape
sequences.

Here are some simple examples:

f1 <- function(){
  # This function generates a warning in S-Plus that "the initial
  # backslash is ignored", but then is read in as intended (two lines).
  # In R 1.9.0 this becomes one-line: "function 1 text"
  l1 <- "function \
1 text"
}

f2 <- function(){
  # This fails in R 1.9.0.  Works fine in S-Plus 6.2
  l2 <- "function
2 text"
}

f3 <- function(){
  # Identical in R and S-Plus
  l3 <- "function \n3 text"
}
  

Mark Bravington supllied the diff and writes:

I've now gotten R-devel to build, and your patch works fine. I just commented out the code rather than deleting it, though the R team might want to do that differently. I renamed 'gram.c' to 'ogram.c' and ran 'diff'-- here's the output (yes it is trivial):

C:\R\R-devel\src\main>diff ogram.c gram.c
3122,3125c3122,3125
<       if (c == '\n') {
<           xxungetc(c);
<           return ERROR;
<       }
---
//    if (c == '\n') {
//        xxungetc(c);
//        return ERROR;
//    }

Presumably the same change should be made in gram.y


Thanks for considering this patch.

Kevin Wright ( & Mark Bravington )



More information about the R-devel mailing list