[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

Fri Sep 9 21:15:08 CEST 2016

Martin et al.,

I seem to be in the minority here, so I won't belabor the point too much,
but one last response inline:

On Thu, Sep 8, 2016 at 11:51 PM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

> Thank you, Gabe and Bill,
>
> for taking up the discussion.
>
> >>>>> William Dunlap <wdunlap at tibco.com>
> >>>>>     on Thu, 8 Sep 2016 10:45:07 -0700 writes:
>
>     > Prior to the mid-1990s, S did "length-0 OP length-n -> rep(NA, n)"
> and it
>     > was changed
>     > to "length-0 OP length-n -> length-0" to avoid lots of problems like
>     > any(x<0) being NA
>     > when length(x)==0.  Yes, people could code defensively by putting
> lots of
>     > if(length(x)==0)...
>     > in their code, but that is tedious and error-prone and creates
> really ugly
>     > code.
>
> Yes, so actually, basically
>
>      length-0 OP <anything>  -> length-0
>
> Now the case of NULL that Bill mentioned.
> I agree that NULL  is not at all the same thing as  double(0) or
> logical(0),
> *but* there have been quite a few cases, where NULL is the
> result of operations where "for consistency"  double(0) / logical(0)
> should have
> been.... and there are the users who use NULL as the equivalent
> of those, e.g., by initializing a (to be grown, yes, very inefficient!)
> vector with NULL instead of with say double(0).
>
> For these reasons, many operations that expect a "number-like"
> (includes logical) atomic vector have treated NULL as such...
> *and* parts of the {arith/logic/relop} OPs have done so already
> in R "forever".
> I still would argue that for these OPs, treating NULL as  logical(0) {which
> then may be promoted by the usual rules} is good thing.
>
>
>     > Is your suggestion to leave the length-0 OP length-1 case as it is
> but make
>     > length-0 OP length-two-or-higher an error or warning (akin to the
> length-2
>     > OP length-3 case)?
>
> That's exactly what one thing the current changes eliminated:
> arithmetic (only; not logic, or relop) did treat the length-1
> case (for arrays!) different from the length-GE-2 case.  And I strongly
> believe that this is very wrong and counter to the predominant
> recycling rules in (S and) R.
>

In my view, the recycling rules apply first and foremost to pairs of
vectors of lengths n,m >=1. And they can be semantically explained in that
case very easily: "the shorter, non-zero-length vector is rep'ed out to be
the length of the longer vector and then (generally) an element wise
operation takes place". The zero-length behavior already does not adhere to
this definition, as it would be impossible to do in the case of a
zero-length vector and a nonzero-length vector.

So the zero-length recycling behavior is already special-cased as I
understand it. In light of that, it seems that it would be allowable to
have different behavior based on the length of the other vector.
Furthermore, while I acknowledge the usefulness of the

x = numeric()

x <  5

case (i.e., the other vector is length 1), I can't come up with any use of,
e.g.,

y  = numeric()
y < 3:5

That I can make any sense of other than as a violation of implicit
assumptions by the coder about the length of y.

Thus, I still think that should at *least* warn, preferably (imho) give an
error.

Best,
~G

-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

	[[alternative HTML version deleted]]