[Rd] Improving string concatenation
Hervé Pagès
hpages at fredhutch.org
Thu Jun 18 00:55:21 CEST 2015
Hi Bill,
On 06/17/2015 12:36 PM, William Dunlap wrote:
> if '+' and paste don't change their behavior with respect to
> factors but you encourage people to use '+' instead of paste
> then you will run into problems with data.frame columns because
> many people don't notice whether a character-like column is
> character or factor. With paste() this is not a problem but with '+'
> it is. I think it is good not to make people worry about this much.
>
> As for the recycling issue, consider calls involving NULL arguments,
> > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed")
> > f(1)
> [1] "1 test failed"
> > f(0)
> [1] "0 tests failed"
> If paste0 followed the same recycling rules as "+" then f(1) would return
> character(0). There is a fair bit of code like that on CRAN.
OTOH a very common use case is to use paste (or paste0) to add a given
prefix (or suffix) to a bunch of strings:
paste0("ID", x) # buggy! (won't do the right thing if length(x) is 0)
This is like "adding" something to 'x' so it's conceptually no different
from doing:
x + 5
which does the right thing when 'x' is a numeric(0).
Anyway, I don't think anybody suggested to change the recycling rules
of paste() or paste0() (which would of course break some existing code
that relies on it, but that's a very generic statement right?), only
to adopt the recycling rules of `+` and other binary arithmetic and
comparison operators if `+` was used to concatenate strings.
Cheers,
H.
>
> Consider using sprintf() to get the sort of recycling rules that "+" uses
> > sprintf("%s is %d", c("One","Two"), numeric(0))
> character(0)
> > sprintf("%s is %d", c("One","Two"), 17)
> [1] "One is 17" "Two is 17"
> > sprintf("%s is %d", c("One","Two"), 26:27)
> [1] "One is 26" "Two is 27"
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi <csardi.gabor at gmail.com>
> wrote:
>
>> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>>>> ... adding the ability to concat
>>>> strings with '+' would be a relatively simple addition (no pun intended)
>>> to
>>>> the code base I believe. With a lot of other languages supporting this
>>> kind
>>>> of concatenation, this is what surprised me most when first learning R.
>>>
>>> Wow! R has a lot of surprising features and I would have thought
>>> this would be quite a way down the list.
>>
>> Well, it is hard to guess what users and people in general find
>> surprising. As '+' is used for string concatenation in essentially all
>> major scripting (and many other) languages, personally I am not
>> surprised that this is surprising for people. :)
>>
>>> How would this new '+' deal with factors, as paste does or as the current
>>> '+'
>>> does?
>>
>> The same as before. It would not change the behavior for other
>> classes, only basic characters.
>>
>>> Would number+string and string+number cause errors (as in current
>>> '+' in R and python) or coerce both to strings (as in current R:paste and
>>> in perl's '+').
>>
>> Would cause errors, exactly as it does right now.
>>
>>> Having '+' work on all types of data can let improperly imported data
>>> get further into the system before triggering an error.
>>
>> Nobody is asking for this. Only characters, not all types of data.
>>
>>> I see lots of
>>> errors
>>> reported on this list that are due to read.table interpreting text as
>>> character
>>> strings instead of the numbers that the user expected. Detecting that
>>> error as early as possible is good.
>>
>> Isn't that a problem with read.table then? Detecting it there would be
>> the earliest possible, no?
>>
>> Gabor
>>
>> [...]
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list