[Rd] "+" for character method...
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Aug 26 17:43:02 CEST 2006
There are several problems with %+% :
- %whatever% should be open for use by the user and if R starts
taking them over they won't be
- %+% is ugly
- %+% is not consistent with other languages (the C-based syntax
of R is supposed to leverage off one's knowledge of other languages)
Personally I would prefer status quo, + or paste0 to %+% .
On 8/26/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 8/26/2006 10:26 AM, John Chambers wrote:
> > Well, two comments, in two non-compatible directions.
> >
> > 1. I have to say that I find the idea of using "+" to paste character
> > strings together aesthetically ugly.
> >
> > IMO, one thing that makes functional object-based languages attractive
> > is that the generic function retains a consistent _function_, that is,
> > purpose and meaning, of which the methods are implementations.
> >
> > It escapes me totally why I should think of pasting strings as addition
> > in the mathematical or intuitive sense (as Brian points out re
> > commutativity, it fails a number of axiomatic properties). And if so,
> > what about "-", "*", "/" and so on? The mind boggles.
>
> Assuming that your "totally" is literally true:
>
> Strings don't form a commutative group under concatenation, but the
> operation is associative, and there's a zero element "". This makes
> them a monoid or unitary semigroup. The natural numbers (including
> zero) are another example of a monoid under addition. It's not that
> weird to have addition defined without negatives.
>
> Concatenation seems to me to be the most natural interpretation of
> addition for strings.
>
> According to Wikipedia, the "+" operator is used for concatenation in
> BASIC, Pascal, Delphi, Javascript, Java, Python, C++ and Ruby. These
> are probably the most commonly used modern languages other than C (which
> has no concatenation operator) or Fortran (which I just discovered today
> uses "//").
>
> Other possibilities on the Wikipedia page that don't conflict with
> something else in R are:
>
> Visual Basic and VHDL use the "&" sign.
>
> Standard SQL, PL/I, and Maple from version 6 uses double pipe signs ("||").
>
> OCaml uses "^".
>
> So it seems to me that defining addition of strings to be concatenation
> is a reasonably widespread convention.
>
> I don't think there are widespread conventions for subtraction,
> multiplication or division of strings, so I can't see any argument for
> implementing them.
>
> > Its excuse presumably is to save typing, but I would favor using some
> > %thing% operator at the cost of a couple of extra key strokes.
>
> I think consistency with other common languages is a stronger reason.
> Other than that, I'd be perfectly happy with %+%.
>
> Duncan Murdoch
>
>
> >
> > 2. Having said that, it's a reasonable hope that efficiency of
> > dispatch will not be a serious problem. There are a bunch of fixes, for
> > semantic correctness and efficiency, nearly ready to commit (the
> > Bioconductor folks have been doing some valuable testing). These should
> > help, and more important perhaps it's fairly easy to see how dispatch in
> > this form can be tuned for performance if necessary.
> >
> > John
> >
> > Bill Dunlap wrote:
> >>>> >> There have been propositions to make "+" work in S (and
> >>>> >> R) like in some other languages, namely for character
> >>>> >> (vectors),
> >>>> >>
> >>>> >> a + b := paste(a,b, sep="")
> >>>> ...
> >>>> yes. I think however if we keep speed and clarity and catching
> >>>> user errors all in mind, it would be enough - and better - to
> >>>> only dispatch to paste(.,.) when both arguments are character
> >>>> (vectors), i.e., the above case needed
> >>>> "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
> >>>> which after all is really more clearer, even more for cases of
> >>>> "1" + 2 which I'd rather want keeping to give errors.
> >>>>
> >>>> If Char + Num should work like above, then also
> >>>> Num + Char should (since after all, "+" should be commutative
> >>>> apart from floating point precision issues).
> >>>>
> >>>> and so the internal C code gets a bit more complicated and slightly
> >>>> slower.. something we had in mind we should strongly avoid...
> >>>>
> >>> I doubt that it would be measurably slower, but I agree that requiring
> >>> both args to be Char could be done in fewer operations than just
> >>> requiring one.
> >>>
> >>> However, I think the consistency argument is stronger. We have a rule
> >>> that operations on mixed types promote the more restrictive type to the
> >>> less restrictive one, and I don't think we should handle this case
> >>> differently.
> >>>
> >>> So I'd say we should allow all of Char + Num, Num + Char, and Char +
> >>> Char, or, if this costs too much at evaluation time, we shouldn't allow
> >>> any of them.
> >>>
> >> Currently doing arithmetic on mixed class data.frames
> >> produces useful warnings and errors. E.g.,
> >>
> >> > z <- data.frame(Factor=factor(c("Lo","Med","High")),
> >> Char=letters[1:3],
> >> Num1=exp(0:2),
> >> Num2=(1:3)*pi,
> >> stringsAsFactors=FALSE)
> >> > z+1
> >> Error in FUN(left, right) : non-numeric argument to binary operator
> >> In addition: Warning message:
> >> + not meaningful for factors in: Ops.factor(left, right)
> >> > z[,-2] + 1
> >> Factor Num1 Num2
> >> 1 NA 2.000000 4.141593
> >> 2 NA 3.718282 7.283185
> >> 3 NA 8.389056 10.424778
> >> Warning message:
> >> + not meaningful for factors in: Ops.factor(left, right)
> >>
> >> If we made + do paste(sep="") for character+number then
> >> we would lose the messages and let garbage flow further
> >> down the pipe.
> >>
> >> Should factor data be treated as character data in this
> >> case (e.g., pasting to the levels)? That would be weird,
> >> but many users confound character and factor data when
> >> they are buried in data.frames.
> >>
> >> ----------------------------------------------------------------------------
> >> Bill Dunlap
> >> Insightful Corporation
> >> bill at insightful dot com
> >> 360-428-8146
> >>
> >> "All statements in this message represent the opinions of the author and do
> >> not necessarily reflect Insightful Corporation policy or position."
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list