[Rd] Improving string concatenation

William Dunlap wdunlap at tibco.com
Wed Jun 17 21:36:59 CEST 2015


if '+' and paste don't change their behavior with respect to
factors but you encourage people to use '+' instead of paste
then you will run into problems with data.frame columns because
many people don't notice whether a character-like column is
character or factor.  With paste() this is not a problem but with '+'
it is.  I think it is good not to make people worry about this much.

As for the recycling issue, consider calls involving NULL arguments,
  > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed")
  > f(1)
  [1] "1 test failed"
  > f(0)
  [1] "0 tests failed"
If paste0 followed the same recycling rules as "+" then f(1) would return
character(0).  There is a fair bit of code like that on CRAN.

Consider using sprintf() to get the sort of recycling rules that "+" uses
  > sprintf("%s is %d", c("One","Two"), numeric(0))
  character(0)
  > sprintf("%s is %d", c("One","Two"), 17)
  [1] "One is 17" "Two is 17"
  > sprintf("%s is %d", c("One","Two"), 26:27)
  [1] "One is 26" "Two is 27"



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi <csardi.gabor at gmail.com>
wrote:

> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com>
> wrote:
> >> ... adding the ability to concat
> >> strings with '+' would be a relatively simple addition (no pun intended)
> > to
> >> the code base I believe. With a lot of other languages supporting this
> > kind
> >> of concatenation, this is what surprised me most when first learning R.
> >
> > Wow!  R has a lot of surprising features and I would have thought
> > this would be quite a way down the list.
>
> Well, it is hard to guess what users and people in general find
> surprising. As '+' is used for string concatenation in essentially all
> major scripting (and many other) languages, personally I am not
> surprised that this is surprising for people. :)
>
> > How would this new '+' deal with factors, as paste does or as the current
> > '+'
> > does?
>
> The same as before. It would not change the behavior for other
> classes, only basic characters.
>
> > Would number+string and string+number cause errors (as in current
> > '+' in R and python) or coerce both to strings (as in current R:paste and
> > in perl's '+').
>
> Would cause errors, exactly as it does right now.
>
> > Having '+' work on all types of data can let improperly imported data
> > get further into the system before triggering an error.
>
> Nobody is asking for this. Only characters, not all types of data.
>
> > I see lots of
> > errors
> > reported on this list that are due to read.table interpreting text as
> > character
> > strings instead of the numbers that the user expected.  Detecting that
> > error as early as possible is good.
>
> Isn't that a problem with read.table then? Detecting it there would be
> the earliest possible, no?
>
> Gabor
>
> [...]
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list