[Rd] Improving string concatenation
wdunlap at tibco.com
Wed Jun 17 21:36:59 CEST 2015
if '+' and paste don't change their behavior with respect to
factors but you encourage people to use '+' instead of paste
then you will run into problems with data.frame columns because
many people don't notice whether a character-like column is
character or factor. With paste() this is not a problem but with '+'
it is. I think it is good not to make people worry about this much.
As for the recycling issue, consider calls involving NULL arguments,
> f <- function(n)paste0(n, " test", if(n!=1)"s", " failed")
 "1 test failed"
 "0 tests failed"
If paste0 followed the same recycling rules as "+" then f(1) would return
character(0). There is a fair bit of code like that on CRAN.
Consider using sprintf() to get the sort of recycling rules that "+" uses
> sprintf("%s is %d", c("One","Two"), numeric(0))
> sprintf("%s is %d", c("One","Two"), 17)
 "One is 17" "Two is 17"
> sprintf("%s is %d", c("One","Two"), 26:27)
 "One is 26" "Two is 27"
On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi <csardi.gabor at gmail.com>
> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com>
> >> ... adding the ability to concat
> >> strings with '+' would be a relatively simple addition (no pun intended)
> > to
> >> the code base I believe. With a lot of other languages supporting this
> > kind
> >> of concatenation, this is what surprised me most when first learning R.
> > Wow! R has a lot of surprising features and I would have thought
> > this would be quite a way down the list.
> Well, it is hard to guess what users and people in general find
> surprising. As '+' is used for string concatenation in essentially all
> major scripting (and many other) languages, personally I am not
> surprised that this is surprising for people. :)
> > How would this new '+' deal with factors, as paste does or as the current
> > '+'
> > does?
> The same as before. It would not change the behavior for other
> classes, only basic characters.
> > Would number+string and string+number cause errors (as in current
> > '+' in R and python) or coerce both to strings (as in current R:paste and
> > in perl's '+').
> Would cause errors, exactly as it does right now.
> > Having '+' work on all types of data can let improperly imported data
> > get further into the system before triggering an error.
> Nobody is asking for this. Only characters, not all types of data.
> > I see lots of
> > errors
> > reported on this list that are due to read.table interpreting text as
> > character
> > strings instead of the numbers that the user expected. Detecting that
> > error as early as possible is good.
> Isn't that a problem with read.table then? Detecting it there would be
> the earliest possible, no?
[[alternative HTML version deleted]]
More information about the R-devel