[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

Wed May 6 19:53:27 CEST 2015

Hi,

On 05/06/2015 09:04 AM, Henrik Bengtsson wrote:
> On Wed, May 6, 2015 at 1:33 AM, Martin Maechler
> <maechler at lynne.stat.math.ethz.ch> wrote:
>>>>>>> John Chambers <jmc at stat.stanford.edu>
>>>>>>>      on Tue, 5 May 2015 08:39:46 -0700 writes:
>>
>>      > When someone suggests that we "might have had a reason" for some peculiarity in the original S, my usual reaction is "Or else we never thought of the problem".
>>      > In this case, however, there is a relevant statement in the 1988 "blue book".  In the discussion of subscripting (p 358) the definition for negative i says: "the indices consist of the elements of seq(along=x) that do not match any elements in -i".
>>
>>      > Suggesting that no bounds checking on -i takes place.
>>
>>      > John
>>
>> Indeed!
>> Thanks a lot John, for the perspective and clarification!
>>
>> I'm committing a patch to the documentation now.
>
> Thank you both and also credits to Dongcan Jiang for pointing out to
> me that errors were indeed not generated in this case.
>
> I agree with the decision. It's interesting to notice that now the
> only way an error is generated is when index-vector subsetting is done
> using mixed positive and negative indices, e.g. x[c(-1,1)].

This is why in situations where I need to extract a single element from
an atomic vector I use [[ instead of [. It's safer (performs 
bound-checking), a little bit faster (at least last time I checked), and
drops the name of the element.

BTW did you know that one can use a negative index with [[ on a
vector of length 2?

   > c(a=2, b=6)[[-1]]
   [1] 6
   > c(a=2, b=6)[[-2]]
   [1] 2
   > list(a=22, b=6:5)[[-1]]
   [1] 6 5
   > list(a=22, b=6:5)[[-2]]
   [1] 22
   > list(a=22, b=6:5)[[c(-1, -2)]]
   [1] 6
   > list(a=22, b=6:5)[[c(-1, -1)]]

Also works with [[<-:

   > x <- list(a=22, b=6:5)
   > x[[c(-1, -2)]] <- 99L
   > x
   $a
   [1] 22

   $b
   [1] 99  5

Not that I ever needed that "feature" though...

Cheers,
H.

>
> /Henrik
>
>> Martin
>>
>>
>>      > On May 5, 2015, at 7:01 AM, Martin Maechler <maechler at lynne.stat.math.ethz.ch> wrote:
>>
>>      >>>>>>> Henrik Bengtsson <henrik.bengtsson at ucsf.edu>
>>      >>>>>>> on Mon, 4 May 2015 12:20:44 -0700 writes:
>>      >>
>>      >>> In Section 'Indexing by vectors' of 'R Language Definition'
>>      >>> (http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
>>      >>> it says:
>>      >>
>>      >>> "Integer. All elements of i must have the same sign. If they are
>>      >>> positive, the elements of x with those index numbers are selected. If
>>      >>> i contains negative elements, all elements except those indicated are
>>      >>> selected.
>>      >>
>>      >>> If i is positive and exceeds length(x) then the corresponding
>>      >>> selection is NA. A negative out of bounds value for i causes an error.
>>      >>
>>      >>> A special case is the zero index, which has null effects: x[0] is an
>>      >>> empty vector and otherwise including zeros among positive or negative
>>      >>> indices has the same effect as if they were omitted."
>>      >>
>>      >>> However, that "A negative out of bounds value for i causes an error"
>>      >>> in the second paragraph does not seem to apply.  Instead, R silently
>>      >>> ignore negative indices that are out of range.  For example:
>>      >>
>>      >>>> x <- 1:4
>>      >>>> x[-9L]
>>      >>> [1] 1 2 3 4
>>      >>>> x[-c(1:9)]
>>      >>> integer(0)
>>      >>>> x[-c(3:9)]
>>      >>> [1] 1 2
>>      >>
>>      >>>> y <- as.list(1:4)
>>      >>>> y[-c(1:9)]
>>      >>> list()
>>      >>
>>      >>> Is the observed non-error the correct behavior and therefore the
>>      >>> documentation is incorrect, or is it vice verse?  (...or is it me
>>      >>> missing something)
>>      >>
>>      >>> I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
>>      >>> (haven't check earlier versions).
>>      >>
>>      >> Thank you, Henrik!
>>      >>
>>      >> I've checked further back: The change happened between R 2.5.1 and R 2.6.0.
>>      >>
>>      >> The previous behavior was
>>      >>
>>      >>> (1:3)[-(3:5)]
>>      >> Error: subscript out of bounds
>>      >>
>>      >> If you start reading NEWS.2, you see a *lot* of new features
>>      >> (and bug fixes) in the 2.6.0 news, but from my browsing, none of
>>      >> them mentioned the new behavior as feature.
>>      >>
>>      >> Let's -- for a moment -- declare it a bug in the code, i.e., not
>>      >> in the documentation:
>>      >>
>>      >> - As 2.6.0  happened quite a while ago (Oct. 2007),
>>      >> we could wonder how much R code will break if we fix the bug.
>>      >>
>>      >> - Is the R package authors' community willing to do the necessary
>>      >> cleanup in their packages ?
>>      >>
>>      >> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
>>      >>
>>      >>
>>      >> Now, after reading the source code for a while, and looking at
>>      >> the changes, I've found the log entry
>>      >>
>>      >> ------------------------------------------------------------------------
>>      >> r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4 lines
>>      >>
>>      >> Changed the behaviour of out-of-bounds negative
>>      >> subscripts to match that of S.  Such values are
>>      >> now ignored rather than tripping an error.
>>      >>
>>      >> ------------------------------------------------------------------------
>>      >>
>>      >> So, it was changed on purpose, by one of the true "R"s, very
>>      >> much on purpose.
>>      >>
>>      >> Making it a *warning* instead of the original error
>>      >> may have been both more cautious and more helpful for
>>      >> detecting programming errors.
>>      >>
>>      >> OTOH, John Chambers, the father of S and hence grandfather of R,
>>      >> may have had good reasons why it seemed more logical to silently
>>      >> ignore such out of bound negative indices:
>>      >> One could argue that
>>      >>
>>      >> x[-5]  means  "leave away the 5-th element of x"
>>      >>
>>      >> and if there is no 5-th element of x, leaving it away should be a no-op.
>>      >>
>>      >> After all this musing and history detection, my gut decision
>>      >> would be to only change the documentation which Ross forgot to change.
>>      >>
>>      >> But of course, it may be interesting to hear other programmeR's feedback on this.
>>      >>
>>      >> Martin
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319