[Rd] "+" for character method...
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Aug 25 23:25:00 CEST 2006
1) I'd like to take a look at what is involved before commenting on
efficiency issues. They may not be what I thought they were (or at least,
being generic at all may be so big a hit that a few more cases may be
immaterial).
2) This + is clearly not commutative.
3) + is part of group generic. I think it is a little awkward to change
the dispatch rules for just one member of the group. That's one argument
for character + character being different from character + number.
I'm not much in favour of adding special cases.
Brian
On Fri, 25 Aug 2006, Martin Maechler wrote:
> >>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca>
> >>>>> on Fri, 25 Aug 2006 13:18:42 -0400 writes:
>
> Duncan> On 8/25/2006 12:31 PM, Martin Maechler wrote:
> >> This thread remains me of an old recurring (last May!)
> >> theme which maybe fits well to Friday late afternoon...
> >>
> >> There have been propositions to make "+" work in S (and
> >> R) like in some other languages, namely for character
> >> (vectors),
> >>
> >> a + b := paste(a,b, sep="")
> >>
> >> IIRC, when this theme came up last, the one argument
> >> against it was the penalty of method dispatch that we
> >> were not willing to pay for something as fundamentally
> >> speed-important as "+" -- which is a .Primitive in R
> >> exactly for that reason of efficiency.
> >>
> >> But then, we actually do dispatch for "+" -- internally
> >> in C code via DispatchGroup() --- but only if we need, so
> >> not when usual numeric/complex arguments are used.
> >>
> >> I think - but may be wrong - it should be possible to
> >> also check very fast for two "character" arguments and in
> >> that case do a fast version of paste(a, b, sep="").
>
> Duncan> But for consistency shouldn't this work if only one
> Duncan> of the args is character, coercing the other to
> Duncan> character? E.g. we have
>
> >> "2" > 10
> Duncan> [1] TRUE
>
> yes. But see also below
>
> >> When this last came up (in May), Brian said that about
> >> the fact that you could not just simply define
> >> "+.character"
> >>
> >>>> I would think that the intention was also to positively
> >>>> discourage messing with the basics of R, as if you were
> >>>> able to do this erroneous uses would likely not get
> >>>> caught.
> >> (
> >> https://stat.ethz.ch/pipermail/r-help/2006-May/104751.html
> >> ) and subsequently
> >> (https://stat.ethz.ch/pipermail/r-help/2006-May/104754.html)
> >> gave an example for this
> >>
> >>>> 2 + x, for example, where x is not numeric.
>
> Duncan> This is a valid concern, but I think the clarity
> Duncan> obtained by coding paste operations using + is worth
> Duncan> it.
>
> Duncan> For example, the first instance of paste(a, b,
> Duncan> sep="") I see in the source is
>
> Duncan> is.ALL(structure(1:7, names = paste("a",1:7,sep="")))
>
> Duncan> in base/demo/is.things.R
>
> Duncan> which I find clearer as
>
> Duncan> is.ALL(structure(1:7, names = "a" + 1:7))
>
>
> Duncan> But then I'm used to using + for strings from
> Duncan> Borland's Pascal extensions; to a C-speaker the
> Duncan> meaning may not be so obvious.
>
> yes. I think however if we keep speed and clarity and catching
> user errors all in mind, it would be enough - and better - to
> only dispatch to paste(.,.) when both arguments are character
> (vectors), i.e., the above case needed
> "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
> which after all is really more clearer, even more for cases of
> "1" + 2 which I'd rather want keeping to give errors.
>
> If Char + Num should work like above, then also
> Num + Char should (since after all, "+" should be commutative
> apart from floating point precision issues).
>
> and so the internal C code gets a bit more complicated and slightly
> slower.. something we had in mind we should strongly avoid...
>
> Martin
>
> >> I wonder however, if we do this in C, and basically only
> >> go into the paste-branch when both arguments are
> >> characters, if we wouldn't get to a nice useful solution
> >> without a noticable performance penalty.
> >>
> >> This would also solve my other slight related uneasyness
> >> : Many times in the past, when using paste(..., sep='')
> >> in function definitions I had wanted this (empty sep) to
> >> be the default and to have an easier, more readable way
> >> to achieve the same.
> >>
> >> But then these all are just musings at the end of the
> >> week...
> >>
> >> Martin Maechler, ETH Zurich
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> Duncan> ______________________________________________
> Duncan> R-devel at r-project.org mailing list
> Duncan> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list