[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

Hervé Pagès hp@ge@ @end|ng |rom |redhutch@org
Sat May 23 00:16:09 CEST 2020


Gabe,

It's the current behavior of paste() that is a major source of bugs:

   ## Add "rs" prefix to SNP ids and collapse them in a
   ## comma-separated string.
   collapse_snp_ids <- function(snp_ids)
       paste("rs", snp_ids, sep="", collapse=",")

   snp_groups <- list(
     group1=c(55, 22, 200),
     group2=integer(0),
     group3=c(99, 550)
   )

   vapply(snp_groups, collapse_snp_ids, character(1))
   #            group1            group2            group3
   # "rs55,rs22,rs200"              "rs"      "rs99,rs550"

This has hit me so many times!

Now with 'collapse0=TRUE', we finally have the opportunity to make it do 
the right thing. Let's not miss that opportunity.

Cheers,
H.


On 5/22/20 11:26, Gabriel Becker wrote:
> I understand that this is consistent but it also strikes me as an 
> enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth 
> over at this point in user-facing R space.
> 
> For the record I'm not suggesting it should return something other than 
> "", and in particular I'm not arguing that any call to paste /that does 
> not return an error/ with non-NULL collapse should return a character 
> vector of length one.
> 
> Rather I'm pointing out that it could (perhaps should, imo) simply be an 
> error, which is also consistent, in the strict sense, with 
> previous behavior in that it is the developer simply declining to extend 
> the recycle0 argument to the full parameter space (there is no rule that 
> says we must do so, arguments whose use is incompatible with other 
> arguments can be reasonable and called for).
> 
> I don't feel feel super strongly that reeturning "" in this and similar 
> cases horrible and should never happen, but i'd bet dollars to donuts 
> that to the extent that behavior occurs it will be a disproportionately 
> major source of bugs, and i think thats at least worth considering in 
> addition to pure consistency.
> 
> ~G
> 
> On Fri, May 22, 2020 at 9:50 AM William Dunlap <wdunlap using tibco.com 
> <mailto:wdunlap using tibco.com>> wrote:
> 
>     I agree with Herve, processing collapse happens last so
>     collapse=non-NULL always leads to a single character string being
>     returned, the same as paste(collapse="").  See the altPaste function
>     I posted yesterday.
> 
>     Bill Dunlap
>     TIBCO Software
>     wdunlap tibco.com
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=>
> 
> 
>     On Fri, May 22, 2020 at 9:12 AM Hervé Pagès <hpages using fredhutch.org
>     <mailto:hpages using fredhutch.org>> wrote:
> 
>         I think that
> 
>              paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse
>         = ",",
>         recycle0=TRUE)
> 
>         should just return an empty string and don't see why it needs to
>         emit a
>         warning or raise an error. To me it does exactly what the user
>         is asking
>         for, which is to change how the 3 arguments are recycled
>         **before** the
>         'sep' operation.
> 
>         The 'recycle0' argument has no business in the 'collapse' operation
>         (which comes after the 'sep' operation): this operation still
>         behaves
>         like it always had.
> 
>         That's all there is to it.
> 
>         H.
> 
> 
>         On 5/22/20 03:00, Gabriel Becker wrote:
>          > Hi Martin et al,
>          >
>          >
>          >
>          > On Thu, May 21, 2020 at 9:42 AM Martin Maechler
>          > <maechler using stat.math.ethz.ch
>         <mailto:maechler using stat.math.ethz.ch>
>         <mailto:maechler using stat.math.ethz.ch
>         <mailto:maechler using stat.math.ethz.ch>>> wrote:
>          >
>          >      >>>>> Hervé Pagès
>          >      >>>>>     on Fri, 15 May 2020 13:44:28 -0700 writes:
>          >
>          >          > There is still the situation where **both** 'sep' and
>          >     'collapse' are
>          >          > specified:
>          >
>          >          >> paste(integer(0), "nth", sep="", collapse=",")
>          >          > [1] "nth"
>          >
>          >          > In that case 'recycle0' should **not** be ignored i.e.
>          >
>          >          > paste(integer(0), "nth", sep="", collapse=",",
>         recycle0=TRUE)
>          >
>          >          > should return the empty string (and not
>         character(0) like it
>          >     does at the
>          >          > moment).
>          >
>          >          > In other words, 'recycle0' should only control the
>         first
>          >     operation (the
>          >          > operation controlled by 'sep'). Which makes plenty
>         of sense:
>          >     the 1st
>          >          > operation is binary (or n-ary) while the collapse
>         operation
>          >     is unary.
>          >          > There is no concept of recycling in the context of
>         unary
>          >     operations.
>          >
>          >     Interesting, ..., and sounding somewhat convincing.
>          >
>          >          > On 5/15/20 11:25, Gabriel Becker wrote:
>          >          >> Hi all,
>          >          >>
>          >          >> This makes sense to me, but I would think that
>         recycle0 and
>          >     collapse
>          >          >> should actually be incompatible and paste should
>         throw an
>          >     error if
>          >          >> recycle0 were TRUE and collapse were declared in
>         the same
>          >     call. I don't
>          >          >> think the value of recycle0 should be silently
>         ignored if it
>          >     is actively
>          >          >> specified.
>          >          >>
>          >          >> ~G
>          >
>          >     Just to summarize what I think we should know and agree
>         (or be
>          >     be "disproven") and where this comes from ...
>          >
>          >     1) recycle0 is a new R 4.0.0 option in paste() / paste0()
>         which by
>          >     default
>          >         (recycle0 = FALSE) should (and *does* AFAIK) not
>         change anything,
>          >         hence  paste() / paste0() behave completely
>         back-compatible
>          >         if recycle0 is kept to FALSE.
>          >
>          >     2) recycle0 = TRUE is meant to give different behavior,
>         notably
>          >         0-length arguments (among '...') should result in
>         0-length results.
>          >
>          >         The above does not specify what this means in detail,
>         see 3)
>          >
>          >     3) The current R 4.0.0 implementation (for which I'm
>         primarily
>          >     responsible)
>          >         and help(paste)  are in accordance.
>          >         Notably the help page (Arguments -> 'recycle0' ;
>         Details 1st
>          >     para ; Examples)
>          >         says and shows how the 4.0.0 implementation has been
>         meant to work.
>          >
>          >     4) Several provenly smart members of the R community
>         argue that
>          >         both the implementation and the documentation of
>         'recycle0 =
>          >         TRUE'  should be changed to be more logical /
>         coherent / sensical ..
>          >
>          >     Is the above all correct in your view?
>          >
>          >     Assuming yes,  I read basically two proposals, both agreeing
>          >     that  recycle0 = TRUE  should only ever apply to the
>         action of 'sep'
>          >     but not the action of 'collapse'.
>          >
>          >     1) Bill and Hervé (I think) propose that 'recycle0'
>         should have
>          >         no effect whenever  'collapse = <string>'
>          >
>          >     2) Gabe proposes that 'collapse = <string>' and 'recycle0
>         = TRUE'
>          >         should be declared incompatible and error. If going
>         in that
>          >         direction, I could also see them to give a warning (and
>          >         continue as if recycle = FALSE).
>          >
>          >
>          > Herve makes a good point about when sep and collapse are both
>         set. That
>          > said, if the user explicitly sets recycle0, Personally, I
>         don't think it
>          > should be silently ignored under any configuration of other
>         arguments.
>          >
>          > If all of the arguments are to go into effect, the question
>         then becomes
>          > one of ordering, I think.
>          >
>          > Consider
>          >
>          >     paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
>         collapse = ",",
>          >     recycle0=TRUE)
>          >
>          > Currently that returns character(0), becuase the logic is
>          > essenttially (in pseudo-code)
>          >
>          >     collapse(paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
>          >     recycle0=TRUE), collapse = ", ", recycle0=TRUE)
>          >
>          >       -> collapse(character(0), collapse = ", " recycle0=TRUE)
>          >
>          >     -> character(0)
>          >
>          > Now Bill Dunlap argued, fairly convincingly I think, that
>         paste(...,
>          > collapse=<string>) should /always/ return a character vector
>         of length
>          > exactly one. With recycle0, though,  it will return "" via
>         the progression
>          >
>          >     paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
>         collapse = ",",
>          >     recycle0=TRUE)
>          >
>          >       -> collapse(character(0), collapse = ", ")
>          >
>          >     -> ""
>          >
>          >
>          > because recycle0 is still applied to the sep-based operation
>         which
>          > occurs before collapse, thus leaving a vector of length 0 to
>         collapse.
>          >
>          > That is consistent but seems unlikely to be what the user
>         wanted, imho.
>          > I think if it does this there should be at least a warning
>         when paste
>          > collapses to "" this way, if it is allowed at all (ie if mixing
>          > collapse=<string>and recycle0=TRUEis not simply made an error).
>          >
>          > I would like to hear others' thoughts as well though. @Pages,
>         Herve
>          > <mailto:hpages using fredhutch.org <mailto:hpages using fredhutch.org>>
>         @William Dunlap
>          > <mailto:wdunlap using tibco.com <mailto:wdunlap using tibco.com>> is ""
>         what you envision as thee desired and
>          > useful behavior there?
>          >
>          > Best,
>          > ~G
>          >
>          >
>          >
>          >     I have not yet my mind up but would tend to agree to "you
>         guys",
>          >     but I think that other R Core members should chime in, too.
>          >
>          >     Martin
>          >
>          >          >> On Fri, May 15, 2020 at 11:05 AM Hervé Pagès
>          >     <hpages using fredhutch.org <mailto:hpages using fredhutch.org>
>         <mailto:hpages using fredhutch.org <mailto:hpages using fredhutch.org>>
>          >          >> <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org>>>>
>          >     wrote:
>          >          >>
>          >          >> Totally agree with that.
>          >          >>
>          >          >> H.
>          >          >>
>          >          >> On 5/15/20 10:34, William Dunlap via R-devel wrote:
>          >          >> > I agree: paste(collapse="something", ...)
>         should always
>          >     return a
>          >          >> single
>          >          >> > character string, regardless of the value of
>         recycle0.
>          >     This would be
>          >          >> > similar to when there are no non-NULL arguments
>         to paste;
>          >          >> collapse="."
>          >          >> > gives a single empty string and collapse=NULL
>         gives a zero
>          >     long
>          >          >> character
>          >          >> > vector.
>          >          >> >> paste()
>          >          >> > character(0)
>          >          >> >> paste(collapse=", ")
>          >          >> > [1] ""
>          >          >> >
>          >          >> > Bill Dunlap
>          >          >> > TIBCO Software
>          >          >> > wdunlap tibco.com
>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=>
>          >   
>           <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8&e=>
>          >          >>
>          >   
>           <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=>
>          >          >> >
>          >          >> >
>          >          >> > On Thu, Apr 30, 2020 at 9:56 PM
>         suharto_anggono--- via
>          >     R-devel <
>          >          >> > r-devel using r-project.org
>         <mailto:r-devel using r-project.org> <mailto:r-devel using r-project.org
>         <mailto:r-devel using r-project.org>>
>          >     <mailto:r-devel using r-project.org
>         <mailto:r-devel using r-project.org> <mailto:r-devel using r-project.org
>         <mailto:r-devel using r-project.org>>>> wrote:
>          >          >> >
>          >          >> >> Without 'collapse', 'paste' pastes
>         (concatenates) its
>          >     arguments
>          >          >> >> elementwise (separated by 'sep', " " by
>         default). New in
>          >     R devel
>          >          >> and R
>          >          >> >> patched, specifying recycle0 = FALSE makes mixing
>          >     zero-length and
>          >          >> >> nonzero-length arguments results in length
>         zero. The
>          >     result of
>          >          >> paste(n,
>          >          >> >> "th", sep = "", recycle0 = FALSE) always have
>         the same
>          >     length as
>          >          >> 'n'.
>          >          >> >> Previously, the result is still as long as the
>         longest
>          >     argument,
>          >          >> with the
>          >          >> >> zero-length argument like "". If all og the
>         arguments have
>          >          >> length zero,
>          >          >> >> 'recycle0' doesn't matter.
>          >          >> >>
>          >          >> >> As far as I understand, 'paste' with
>         'collapse' as a
>          >     character
>          >          >> string is
>          >          >> >> supposed to put together elements of a vector
>         into a single
>          >          >> character
>          >          >> >> string. I think 'recycle0' shouldn't change it.
>          >          >> >>
>          >          >> >> In current R devel and R patched,
>         paste(character(0),
>          >     collapse = "",
>          >          >> >> recycle0 = FALSE) is character(0). I think it
>         should be
>          >     "", like
>          >          >> >> paste(character(0), collapse="").
>          >          >> >>
>          >          >> >> paste(c("4", "5"), "th", sep = "", collapse =
>         ", ",
>          >     recycle0 =
>          >          >> FALSE)
>          >          >> >> is
>          >          >> >> "4th, 5th".
>          >          >> >> paste(c("4"     ), "th", sep = "", collapse =
>         ", ",
>          >     recycle0 =
>          >          >> FALSE)
>          >          >> >> is
>          >          >> >> "4th".
>          >          >> >> I think
>          >          >> >> paste(c(        ), "th", sep = "", collapse =
>         ", ",
>          >     recycle0 =
>          >          >> FALSE)
>          >          >> >> should be
>          >          >> >> "",
>          >          >> >> not character(0).
>          >          >> >>
>          >          >> >> ______________________________________________
>          >          >> >> R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>
>          >     <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>>
>          >     mailing list
>          >          >> >>
>          >          >>
>          >
>         https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
>          >          >> >>
>          >          >> >
>          >          >> >       [[alternative HTML version deleted]]
>          >          >> >
>          >          >> > ______________________________________________
>          >          >> > R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>
>          >     <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>>
>          >     mailing list
>          >          >> >
>          >          >>
>          >
>         https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
>          >          >> >
>          >          >>
>          >          >> --
>          >          >> Hervé Pagès
>          >          >>
>          >          >> Program in Computational Biology
>          >          >> Division of Public Health Sciences
>          >          >> Fred Hutchinson Cancer Research Center
>          >          >> 1100 Fairview Ave. N, M1-B514
>          >          >> P.O. Box 19024
>          >          >> Seattle, WA 98109-1024
>          >          >>
>          >          >> E-mail: hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org>>
>          >     <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org>>>
>          >          >> Phone:  (206) 667-5791
>          >          >> Fax:    (206) 667-1319
>          >          >>
>          >          >> ______________________________________________
>          >          >> R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>
>          >     <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>>>
>          >     mailing list
>          >          >> https://stat.ethz.ch/mailman/listinfo/r-devel
>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=>
>          >   
>           <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=>
>          >          >>
>          >   
>           <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=COnDeGgHNnHJlLLZOznMlhcaFU1nIRlkaSbssvlrMvw&e=>
>          >          >>
>          >
>          >          > --
>          >          > Hervé Pagès
>          >
>          >          > Program in Computational Biology
>          >          > Division of Public Health Sciences
>          >          > Fred Hutchinson Cancer Research Center
>          >          > 1100 Fairview Ave. N, M1-B514
>          >          > P.O. Box 19024
>          >          > Seattle, WA 98109-1024
>          >
>          >          > E-mail: hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
>         <mailto:hpages using fredhutch.org>>
>          >          > Phone:  (206) 667-5791
>          >          > Fax:    (206) 667-1319
>          >
>          >          > ______________________________________________
>          >          > R-devel using r-project.org
>         <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
>         <mailto:R-devel using r-project.org>> mailing list
>          >          > https://stat.ethz.ch/mailman/listinfo/r-devel
>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=>
>          >   
>           <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=>
>          >
> 
>         -- 
>         Hervé Pagès
> 
>         Program in Computational Biology
>         Division of Public Health Sciences
>         Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N, M1-B514
>         P.O. Box 19024
>         Seattle, WA 98109-1024
> 
>         E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
>         Phone:  (206) 667-5791
>         Fax:    (206) 667-1319
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list