[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue May 26 15:24:03 CEST 2020
>>>>> Hervé Pagès
>>>>> on Sun, 24 May 2020 14:22:37 -0700 writes:
> On 5/24/20 00:26, Gabriel Becker wrote:
>>
>>
>> On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <hpages using fredhutch.org
>> <mailto:hpages using fredhutch.org>> wrote:
>>
>> On 5/23/20 17:45, Gabriel Becker wrote:
>> > Maybe my intuition is just
>> > different but when I collapse multiple character vectors together, I
>> > expect all the characters from each of those vectors to be in the
>> > resulting collapsed one.
>>
>> Yes I'd expect that too. But the **collapse** operation in paste() has
>> never been about collapsing **multiple** character vectors together.
>> What it does is collapse the **single** character vector that comes out
>> of the 'sep' operation.
>>
>>
>> I understand what it does, I broke ti down the same way in my post
>> earlier in the thread. the fact remains is that it is a single function
>> which significantly muddies the waters. so you can say
>>
>> paste0(x,y, collapse=",", recycle0=TRUE)
>>
>> is not a collapse operation on multiple vectors, and of course there's a
>> sense in which you're not wrong (again I understand what these functions
>> do), but it sure looks like one in the invocation, doesn't it?
>>
>> Honestly the thing that this whole discussion has shown me most clearly
>> is that, imho, collapse (accepting ONLY one data vector) and
>> paste(accepting multiple) should never have been a single function to
>> begin with. But that ship sailed long long ago.
> Yes :-(
>>
>> So
>>
>> paste(x, y, z, sep="", collapse=",")
>>
>> is analogous to
>>
>> sum(x + y + z)
>>
>>
>> Honestly, I'd be significantly more comfortable if
>>
>> 1:10 + integer(0) + 5
>>
>> were an error too.
> This is actually the recycling scheme used by mapply():
>> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
> Error in mapply(FUN = FUN, ...) :
> zero-length inputs cannot be mixed with those of non-zero length
> AFAIK base R uses 3 different recycling schemes for n-ary operations:
> (1) The recycling scheme used by arithmetic and comparison operations
> (Arith, Compare, Logic group generics).
> (2) The recycling scheme used by classic paste().
> (3) The recycling scheme used by mapply().
> Having such a core mechanism like recycling being inconsistent across
> base R is sad. It makes it really hard to predict how a given n-ary
> function will recycle its arguments unless you spend some time trying it
> yourself with several combinations of vector lengths. It is of course
> the source of numerous latent bugs. I wish there was only one but that's
> just a dream.
> None of these 3 recycling schemes is perfect. IMO (2) is by far the
> worst. (3) is too restrictive and would need to be refined if we wanted
> to make it a good universal recycling scheme.
> Anyway I don't think it makes sense to introduce a 4th recycling scheme
> at this point even though it would be a nice item to put on the wish
> list for R 7.0.0 with the ultimate goal that it will universally adopted
> in R 11.0.0 ;-)
> So if we have to do with what we have IMO (1) is the scheme that makes
> most sense although I agree that it can do some surprising things for
> some unusual combinations of vector lengths. It's the scheme I adhere to
> in my own binary operations e.g. in S4Vector::pcompare().
> The modest proposal of the 'recycle0' argument is only to let the user
> switch from recycling scheme (2) to (1) if they're not happy with scheme
> (2) (I'm one of them).
Yes, indeed. This was the purpose of introducing 'recycle0'.
Now, with collapse = <string>, {in R "string" := character vector of length 1}.
we clearly see different interpretations on what is desirable
for recycle0 = TRUE,
all of you (Suharto, Bill, Hervé, Gabe) assert that the behavior
should be different than now, and should either error (possibly,
by Gabe), or return a single string (possibly with a warning),
i.e., collapse = <string> behavior should not be influenced (or
possibly be conflicting with) by recycle0=TRUE.
Within R core, some believe the current recyle0=TRUE behavior to
be the correct one. Personally, I see
reasons for both..
What about remaining back-compatible, not only to R 3.y.z with
default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE
*and* add a new option for the Suharto-Bill-Hervé-Gabe behavior,
e.g., recycle0="sep.only" or just recycle0="sep" ?
As (for back-compatibility reasons) you have to specify
'recycle0 = ..' anyway, you would get what makes most sense to
you by using such a third option.
? (WDYT ?)
Martin
> Switching to scheme (3) or to a new custom scheme
> would be a completely different proposal.
>>
>> At least I'm consistent right?
> Yes :-)
> Anyway discussing recycling schemes is interesting but not directly
> related with what the OP brought up (behavior of the 'collapse' operation).
> Cheers,
> H.
>>
>> ~G
More information about the R-devel
mailing list