[Rd] "+" for character method...
Duncan Murdoch
murdoch at stats.uwo.ca
Sat Aug 26 02:01:07 CEST 2006
On 8/25/2006 6:52 PM, Bill Dunlap wrote:
>>> >> There have been propositions to make "+" work in S (and
>>> >> R) like in some other languages, namely for character
>>> >> (vectors),
>>> >>
>>> >> a + b := paste(a,b, sep="")
>>> ...
>>> yes. I think however if we keep speed and clarity and catching
>>> user errors all in mind, it would be enough - and better - to
>>> only dispatch to paste(.,.) when both arguments are character
>>> (vectors), i.e., the above case needed
>>> "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
>>> which after all is really more clearer, even more for cases of
>>> "1" + 2 which I'd rather want keeping to give errors.
>>>
>>> If Char + Num should work like above, then also
>>> Num + Char should (since after all, "+" should be commutative
>>> apart from floating point precision issues).
>>>
>>> and so the internal C code gets a bit more complicated and slightly
>>> slower.. something we had in mind we should strongly avoid...
>> I doubt that it would be measurably slower, but I agree that requiring
>> both args to be Char could be done in fewer operations than just
>> requiring one.
>>
>> However, I think the consistency argument is stronger. We have a rule
>> that operations on mixed types promote the more restrictive type to the
>> less restrictive one, and I don't think we should handle this case
>> differently.
>>
>> So I'd say we should allow all of Char + Num, Num + Char, and Char +
>> Char, or, if this costs too much at evaluation time, we shouldn't allow
>> any of them.
>
> Currently doing arithmetic on mixed class data.frames
> produces useful warnings and errors. E.g.,
>
> > z <- data.frame(Factor=factor(c("Lo","Med","High")),
> Char=letters[1:3],
> Num1=exp(0:2),
> Num2=(1:3)*pi,
> stringsAsFactors=FALSE)
> > z+1
> Error in FUN(left, right) : non-numeric argument to binary operator
> In addition: Warning message:
> + not meaningful for factors in: Ops.factor(left, right)
> > z[,-2] + 1
> Factor Num1 Num2
> 1 NA 2.000000 4.141593
> 2 NA 3.718282 7.283185
> 3 NA 8.389056 10.424778
> Warning message:
> + not meaningful for factors in: Ops.factor(left, right)
>
> If we made + do paste(sep="") for character+number then
> we would lose the messages and let garbage flow further
> down the pipe.
Yes, I agree, that's a negative. But it is consistent with what we do
elsewhere, and consistency is a good thing:
> z > 1
Factor Char Num1 Num2
1 NA TRUE FALSE TRUE
2 NA TRUE TRUE TRUE
3 NA TRUE TRUE TRUE
Warning message:
> not meaningful for factors in: Ops.factor(left, right)
We get the warning for the factor column, but not the character column.
But is it really common to add values to a data.frame? Are we going to
protect anyone from an error they would really make?
> Should factor data be treated as character data in this
> case (e.g., pasting to the levels)? That would be weird,
> but many users confound character and factor data when
> they are buried in data.frames.
I'd be happy to continue to have the warning in that case. paste() is
pretty flexible, so there would be a lot of cases where paste(x, y,
sep="") gave a result but x+y gave a warning or error.
Duncan Murdoch
>
> ----------------------------------------------------------------------------
> Bill Dunlap
> Insightful Corporation
> bill at insightful dot com
> 360-428-8146
>
> "All statements in this message represent the opinions of the author and do
> not necessarily reflect Insightful Corporation policy or position."
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list