[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Gabriel Becker
g@bembecker @end|ng |rom gm@||@com
Sun May 24 02:45:34 CEST 2020
Herve (et al.),
On Fri, May 22, 2020 at 3:16 PM Hervé Pagès <hpages using fredhutch.org> wrote:
> Gabe,
>
> It's the current behavior of paste() that is a major source of bugs:
>
> ## Add "rs" prefix to SNP ids and collapse them in a
> ## comma-separated string.
> collapse_snp_ids <- function(snp_ids)
> paste("rs", snp_ids, sep="", collapse=",")
>
> snp_groups <- list(
> group1=c(55, 22, 200),
> group2=integer(0),
> group3=c(99, 550)
> )
>
> vapply(snp_groups, collapse_snp_ids, character(1))
> # group1 group2 group3
> # "rs55,rs22,rs200" "rs" "rs99,rs550"
>
> This has hit me so many times!
>
> Now with 'collapse0=TRUE', we finally have the opportunity to make it do
> the right thing. Let's not miss that opportunity.
>
I see what you're saying, but I don' know. Maybe my intuition is just
different but when I collapse multiple character vectors together, I
expect all the characters from each of those vectors to be in the resulting
collapsed one. In your example its a string literal tot be added
elementwise to the prefix, but what if it is another vector of length > 1.
Wouldn't it be strange that all those values are wiped and absent from the
resulting string? Maybe it's just me. like for paste(x,y,z, sep ="",
collapse = ", ", recycle0=TRUE) if length(y) is 0, it literally makes no
difference when x and z are.
I seem to be being largely outvoted anyway though, so we will see what
Martin and others who may pop up might think, but I raised the points I
wanted to raise so we'll see where things ultimately fall.
~G
>
> Cheers,
> H.
>
>
> On 5/22/20 11:26, Gabriel Becker wrote:
> > I understand that this is consistent but it also strikes me as an
> > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth
> > over at this point in user-facing R space.
> >
> > For the record I'm not suggesting it should return something other than
> > "", and in particular I'm not arguing that any call to paste /that does
> > not return an error/ with non-NULL collapse should return a character
> > vector of length one.
> >
> > Rather I'm pointing out that it could (perhaps should, imo) simply be an
> > error, which is also consistent, in the strict sense, with
> > previous behavior in that it is the developer simply declining to extend
> > the recycle0 argument to the full parameter space (there is no rule that
> > says we must do so, arguments whose use is incompatible with other
> > arguments can be reasonable and called for).
> >
> > I don't feel feel super strongly that reeturning "" in this and similar
> > cases horrible and should never happen, but i'd bet dollars to donuts
> > that to the extent that behavior occurs it will be a disproportionately
> > major source of bugs, and i think thats at least worth considering in
> > addition to pure consistency.
> >
> > ~G
> >
> > On Fri, May 22, 2020 at 9:50 AM William Dunlap <wdunlap using tibco.com
> > <mailto:wdunlap using tibco.com>> wrote:
> >
> > I agree with Herve, processing collapse happens last so
> > collapse=non-NULL always leads to a single character string being
> > returned, the same as paste(collapse=""). See the altPaste function
> > I posted yesterday.
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=
> >
> >
> >
> > On Fri, May 22, 2020 at 9:12 AM Hervé Pagès <hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>> wrote:
> >
> > I think that
> >
> > paste(c("a", "b"), NULL, c("c", "d"), sep = " ", collapse
> > = ",",
> > recycle0=TRUE)
> >
> > should just return an empty string and don't see why it needs to
> > emit a
> > warning or raise an error. To me it does exactly what the user
> > is asking
> > for, which is to change how the 3 arguments are recycled
> > **before** the
> > 'sep' operation.
> >
> > The 'recycle0' argument has no business in the 'collapse'
> operation
> > (which comes after the 'sep' operation): this operation still
> > behaves
> > like it always had.
> >
> > That's all there is to it.
> >
> > H.
> >
> >
> > On 5/22/20 03:00, Gabriel Becker wrote:
> > > Hi Martin et al,
> > >
> > >
> > >
> > > On Thu, May 21, 2020 at 9:42 AM Martin Maechler
> > > <maechler using stat.math.ethz.ch
> > <mailto:maechler using stat.math.ethz.ch>
> > <mailto:maechler using stat.math.ethz.ch
> > <mailto:maechler using stat.math.ethz.ch>>> wrote:
> > >
> > > >>>>> Hervé Pagès
> > > >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes:
> > >
> > > > There is still the situation where **both** 'sep'
> and
> > > 'collapse' are
> > > > specified:
> > >
> > > >> paste(integer(0), "nth", sep="", collapse=",")
> > > > [1] "nth"
> > >
> > > > In that case 'recycle0' should **not** be ignored
> i.e.
> > >
> > > > paste(integer(0), "nth", sep="", collapse=",",
> > recycle0=TRUE)
> > >
> > > > should return the empty string (and not
> > character(0) like it
> > > does at the
> > > > moment).
> > >
> > > > In other words, 'recycle0' should only control the
> > first
> > > operation (the
> > > > operation controlled by 'sep'). Which makes plenty
> > of sense:
> > > the 1st
> > > > operation is binary (or n-ary) while the collapse
> > operation
> > > is unary.
> > > > There is no concept of recycling in the context of
> > unary
> > > operations.
> > >
> > > Interesting, ..., and sounding somewhat convincing.
> > >
> > > > On 5/15/20 11:25, Gabriel Becker wrote:
> > > >> Hi all,
> > > >>
> > > >> This makes sense to me, but I would think that
> > recycle0 and
> > > collapse
> > > >> should actually be incompatible and paste should
> > throw an
> > > error if
> > > >> recycle0 were TRUE and collapse were declared in
> > the same
> > > call. I don't
> > > >> think the value of recycle0 should be silently
> > ignored if it
> > > is actively
> > > >> specified.
> > > >>
> > > >> ~G
> > >
> > > Just to summarize what I think we should know and agree
> > (or be
> > > be "disproven") and where this comes from ...
> > >
> > > 1) recycle0 is a new R 4.0.0 option in paste() / paste0()
> > which by
> > > default
> > > (recycle0 = FALSE) should (and *does* AFAIK) not
> > change anything,
> > > hence paste() / paste0() behave completely
> > back-compatible
> > > if recycle0 is kept to FALSE.
> > >
> > > 2) recycle0 = TRUE is meant to give different behavior,
> > notably
> > > 0-length arguments (among '...') should result in
> > 0-length results.
> > >
> > > The above does not specify what this means in detail,
> > see 3)
> > >
> > > 3) The current R 4.0.0 implementation (for which I'm
> > primarily
> > > responsible)
> > > and help(paste) are in accordance.
> > > Notably the help page (Arguments -> 'recycle0' ;
> > Details 1st
> > > para ; Examples)
> > > says and shows how the 4.0.0 implementation has been
> > meant to work.
> > >
> > > 4) Several provenly smart members of the R community
> > argue that
> > > both the implementation and the documentation of
> > 'recycle0 =
> > > TRUE' should be changed to be more logical /
> > coherent / sensical ..
> > >
> > > Is the above all correct in your view?
> > >
> > > Assuming yes, I read basically two proposals, both
> agreeing
> > > that recycle0 = TRUE should only ever apply to the
> > action of 'sep'
> > > but not the action of 'collapse'.
> > >
> > > 1) Bill and Hervé (I think) propose that 'recycle0'
> > should have
> > > no effect whenever 'collapse = <string>'
> > >
> > > 2) Gabe proposes that 'collapse = <string>' and 'recycle0
> > = TRUE'
> > > should be declared incompatible and error. If going
> > in that
> > > direction, I could also see them to give a warning
> (and
> > > continue as if recycle = FALSE).
> > >
> > >
> > > Herve makes a good point about when sep and collapse are both
> > set. That
> > > said, if the user explicitly sets recycle0, Personally, I
> > don't think it
> > > should be silently ignored under any configuration of other
> > arguments.
> > >
> > > If all of the arguments are to go into effect, the question
> > then becomes
> > > one of ordering, I think.
> > >
> > > Consider
> > >
> > > paste(c("a", "b"), NULL, c("c", "d"), sep = " ",
> > collapse = ",",
> > > recycle0=TRUE)
> > >
> > > Currently that returns character(0), becuase the logic is
> > > essenttially (in pseudo-code)
> > >
> > > collapse(paste(c("a", "b"), NULL, c("c", "d"), sep = "
> ",
> > > recycle0=TRUE), collapse = ", ", recycle0=TRUE)
> > >
> > > -> collapse(character(0), collapse = ", " recycle0=TRUE)
> > >
> > > -> character(0)
> > >
> > > Now Bill Dunlap argued, fairly convincingly I think, that
> > paste(...,
> > > collapse=<string>) should /always/ return a character vector
> > of length
> > > exactly one. With recycle0, though, it will return "" via
> > the progression
> > >
> > > paste(c("a", "b"), NULL, c("c", "d"), sep = " ",
> > collapse = ",",
> > > recycle0=TRUE)
> > >
> > > -> collapse(character(0), collapse = ", ")
> > >
> > > -> ""
> > >
> > >
> > > because recycle0 is still applied to the sep-based operation
> > which
> > > occurs before collapse, thus leaving a vector of length 0 to
> > collapse.
> > >
> > > That is consistent but seems unlikely to be what the user
> > wanted, imho.
> > > I think if it does this there should be at least a warning
> > when paste
> > > collapses to "" this way, if it is allowed at all (ie if
> mixing
> > > collapse=<string>and recycle0=TRUEis not simply made an
> error).
> > >
> > > I would like to hear others' thoughts as well though. @Pages,
> > Herve
> > > <mailto:hpages using fredhutch.org <mailto:hpages using fredhutch.org>>
> > @William Dunlap
> > > <mailto:wdunlap using tibco.com <mailto:wdunlap using tibco.com>> is ""
> > what you envision as thee desired and
> > > useful behavior there?
> > >
> > > Best,
> > > ~G
> > >
> > >
> > >
> > > I have not yet my mind up but would tend to agree to "you
> > guys",
> > > but I think that other R Core members should chime in,
> too.
> > >
> > > Martin
> > >
> > > >> On Fri, May 15, 2020 at 11:05 AM Hervé Pagès
> > > <hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> > <mailto:hpages using fredhutch.org <mailto:hpages using fredhutch.org>>
> > > >> <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>>>>
> > > wrote:
> > > >>
> > > >> Totally agree with that.
> > > >>
> > > >> H.
> > > >>
> > > >> On 5/15/20 10:34, William Dunlap via R-devel
> wrote:
> > > >> > I agree: paste(collapse="something", ...)
> > should always
> > > return a
> > > >> single
> > > >> > character string, regardless of the value of
> > recycle0.
> > > This would be
> > > >> > similar to when there are no non-NULL arguments
> > to paste;
> > > >> collapse="."
> > > >> > gives a single empty string and collapse=NULL
> > gives a zero
> > > long
> > > >> character
> > > >> > vector.
> > > >> >> paste()
> > > >> > character(0)
> > > >> >> paste(collapse=", ")
> > > >> > [1] ""
> > > >> >
> > > >> > Bill Dunlap
> > > >> > TIBCO Software
> > > >> > wdunlap tibco.com
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=
> >
> > >
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8&e=
> >
> > > >>
> > >
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=
> >
> > > >> >
> > > >> >
> > > >> > On Thu, Apr 30, 2020 at 9:56 PM
> > suharto_anggono--- via
> > > R-devel <
> > > >> > r-devel using r-project.org
> > <mailto:r-devel using r-project.org> <mailto:r-devel using r-project.org
> > <mailto:r-devel using r-project.org>>
> > > <mailto:r-devel using r-project.org
> > <mailto:r-devel using r-project.org> <mailto:r-devel using r-project.org
> > <mailto:r-devel using r-project.org>>>> wrote:
> > > >> >
> > > >> >> Without 'collapse', 'paste' pastes
> > (concatenates) its
> > > arguments
> > > >> >> elementwise (separated by 'sep', " " by
> > default). New in
> > > R devel
> > > >> and R
> > > >> >> patched, specifying recycle0 = FALSE makes
> mixing
> > > zero-length and
> > > >> >> nonzero-length arguments results in length
> > zero. The
> > > result of
> > > >> paste(n,
> > > >> >> "th", sep = "", recycle0 = FALSE) always have
> > the same
> > > length as
> > > >> 'n'.
> > > >> >> Previously, the result is still as long as the
> > longest
> > > argument,
> > > >> with the
> > > >> >> zero-length argument like "". If all og the
> > arguments have
> > > >> length zero,
> > > >> >> 'recycle0' doesn't matter.
> > > >> >>
> > > >> >> As far as I understand, 'paste' with
> > 'collapse' as a
> > > character
> > > >> string is
> > > >> >> supposed to put together elements of a vector
> > into a single
> > > >> character
> > > >> >> string. I think 'recycle0' shouldn't change it.
> > > >> >>
> > > >> >> In current R devel and R patched,
> > paste(character(0),
> > > collapse = "",
> > > >> >> recycle0 = FALSE) is character(0). I think it
> > should be
> > > "", like
> > > >> >> paste(character(0), collapse="").
> > > >> >>
> > > >> >> paste(c("4", "5"), "th", sep = "", collapse =
> > ", ",
> > > recycle0 =
> > > >> FALSE)
> > > >> >> is
> > > >> >> "4th, 5th".
> > > >> >> paste(c("4" ), "th", sep = "", collapse =
> > ", ",
> > > recycle0 =
> > > >> FALSE)
> > > >> >> is
> > > >> >> "4th".
> > > >> >> I think
> > > >> >> paste(c( ), "th", sep = "", collapse =
> > ", ",
> > > recycle0 =
> > > >> FALSE)
> > > >> >> should be
> > > >> >> "",
> > > >> >> not character(0).
> > > >> >>
> > > >> >> ______________________________________________
> > > >> >> R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>
> > > <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>>
> > > mailing list
> > > >> >>
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
> > > >> >>
> > > >> >
> > > >> > [[alternative HTML version deleted]]
> > > >> >
> > > >> > ______________________________________________
> > > >> > R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>
> > > <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>>
> > > mailing list
> > > >> >
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
> > > >> >
> > > >>
> > > >> --
> > > >> Hervé Pagès
> > > >>
> > > >> Program in Computational Biology
> > > >> Division of Public Health Sciences
> > > >> Fred Hutchinson Cancer Research Center
> > > >> 1100 Fairview Ave. N, M1-B514
> > > >> P.O. Box 19024
> > > >> Seattle, WA 98109-1024
> > > >>
> > > >> E-mail: hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>>
> > > <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>>>
> > > >> Phone: (206) 667-5791
> > > >> Fax: (206) 667-1319
> > > >>
> > > >> ______________________________________________
> > > >> R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>
> > > <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>>>
> > > mailing list
> > > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=
> >
> > >
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=
> >
> > > >>
> > >
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=COnDeGgHNnHJlLLZOznMlhcaFU1nIRlkaSbssvlrMvw&e=
> >
> > > >>
> > >
> > > > --
> > > > Hervé Pagès
> > >
> > > > Program in Computational Biology
> > > > Division of Public Health Sciences
> > > > Fred Hutchinson Cancer Research Center
> > > > 1100 Fairview Ave. N, M1-B514
> > > > P.O. Box 19024
> > > > Seattle, WA 98109-1024
> > >
> > > > E-mail: hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org> <mailto:hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>>
> > > > Phone: (206) 667-5791
> > > > Fax: (206) 667-1319
> > >
> > > > ______________________________________________
> > > > R-devel using r-project.org
> > <mailto:R-devel using r-project.org> <mailto:R-devel using r-project.org
> > <mailto:R-devel using r-project.org>> mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=
> >
> > >
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=
> >
> > >
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> > Phone: (206) 667-5791
> > Fax: (206) 667-1319
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list